Variation in the rate of ageing
  • Home page
  • Data collation
  • Genetic analysis
  • Source Code

Sections

  • Load packages and data
    • Load variant/gene annotations
    • The raw dataset
    • Line mean data
  • \(\mathrm{CV}_G\)
  • Preparing for univariate GWAS
    • Loading data used in GWA tests
    • Install neccessary software and build helper functions
    • Perform SNP quality control and impute missing data
    • Get minor allele frequencies in the DGRP
    • Create a reduced list of LD-pruned SNPs with PLINK
    • Build GWAS function
    • Build manhattan plot function
  • Preparing for cross phenotype meta-analysis
    • The functions
  • Analysing life expectancy and lifespan equality
    • Run univariate GWAS
    • Applying cross-phenotype meta-analysis
      • Generate the genetic correlation matrix
      • Calculate meta-analytic test statistics
    • Visualise the results
  • Analysing the rate of ageing and baseline mortality
    • Axes of ageing rate and baseline mortality
    • Run univariate GWAS
    • Applying cross-phenotype meta-analysis
      • Generate the genetic correlation matrix
      • Calculate meta-analytic test statistics
    • Visualise the results
  • Are ageing and baseline mortality polygenic?
  • Figure 4

Genome wide analyses

  • Show All Code
  • Hide All Code

  • View Source

Load packages and data

The MASS package is required to run the CPASSOC. Unfortunately this clashes with the dplyr select(). So be prepared to use dplyr::select() to get some things to work if you’re adapting the code for your own use.

Show the code
library(tidyverse) # tidy coding, ggplot etc
library(data.table) # for the rleid function
library(glue) # for coding within strings
library(bigsnpr) # to install: devtools::install_github("privefl/bigsnpr")
#library(pander) # for slick simple tables
library(kableExtra) # for medium sized tables
library(DT) # for large, searchable tables
library(brms) # for bayesian models
library(tidybayes) # for bayesian plots
library(ggtext) # for markdown syntax in ggplot
library(ggnewscale) # to reset colour scales
library(MetBrewer) # for more colour palettes
library(MoMAColors) # nicer colours once again
library(PNWColors) # even more colours
#library(hexbin) # for density heat maps
#library(rcartocolor) # even more nice colours
library(patchwork) # for combining plots
#library(ggrepel) # for labelling ggplots
library(pheatmap) # for heat maps
library(MASS) # needed for CPASSOC
library(Matrix) # needed for CPASSOC
#library(flexsurv) # for survival analysis
#library(rptR) # for finding the intraclass correlation coefficient

# build a helper function that produces a table to display our data

# Create a function to build HTML searchable tables

my_data_table <- function(df){
  datatable(
    df, rownames=FALSE,
    autoHideNavigation = TRUE,
    extensions = c("Scroller",  "Buttons"),
    options = list(
      autoWidth = TRUE,
      dom = 'Bfrtip',
      deferRender=TRUE,
      scrollX=TRUE, scrollY=1000,
      scrollCollapse=TRUE,
      buttons =
        list('pageLength', 'colvis', 'csv', list(
          extend = 'pdf',
          pageSize = 'A4',
          orientation = 'landscape',
          filename = 'Lifespan_data')),
      pageLength = 100
    )
  )
}

Load variant/gene annotations

DGRP variant annotations were downloaded from the DGRP website and gene annotations for all the genes covered by DGRP variants, from the org.Dm.eg.db database object from Bioconductor.

These will be useful later when we aim to identify whether variants with notable associations with a trait overlap with any genes.

Show the code
# Helper function to split a vector into chunks 
chunker <- function(x, max_chunk_size) split(x, ceiling(seq_along(x) / max_chunk_size))

if(!file.exists("data/derived/annotations.csv")){
  
  # Load annotation file, get important info
  
  annot <- read.table("data/input/dgrp.fb557.annot.txt", header = FALSE, stringsAsFactors = FALSE)
  
  get.info <- function(rows){
    lapply(rows, function(row){
      site.class.field <- strsplit(annot$V3[row], split = "]")[[1]][1]
      num.genes <- str_count(site.class.field, ";") + 1
      output <- cbind(rep(annot$V1[row], num.genes), 
                      do.call("rbind", lapply(strsplit(site.class.field, split = ";")[[1]], 
                                              function(x) strsplit(x, split = "[|]")[[1]])))
      if(ncol(output) == 5) return(output[,c(1,2,4,5)]) # only return SNPs that have some annotation. Don't get the gene symbol
      else return(NULL)
    }) %>% do.call("rbind", .)
  }
  
  variant.details <- lapply(chunker(1:nrow(annot), max_chunk_size = 10000), get.info) %>% 
    do.call("rbind", .) %>% as.data.frame()
  
  names(variant.details) <- c("SNP", "FBID", "site.class", "distance.to.gene")
  variant.details$FBID <- unlist(str_extract_all(variant.details$FBID, "FBgn[:digit:]+")) # clean up text strings for Flybase ID
  variant.details %>%
    dplyr::filter(site.class != "FBgn0003638") %>% # NB this is a bug in the DGRP's annotation file
    mutate(chr = str_remove_all(substr(SNP, 1, 2), "_")) # get chromosome now for faster sorting later
  
  annotations <- variant.details
} else annotations <- read_csv("data/derived/annotations.csv")

annotations <-
  annotations %>% 
  left_join(read.csv("data/Input/all_dmel_genes.csv")) %>% 
  dplyr::select(SNP, FBID, site.class, distance.to.gene, gene_name, chromosome)

The raw dataset

Show the code
raw_data <- 
  read_delim("data/Input/Raw_data/all_raw_data.csv",delim=',') %>% 
  mutate(line = as.factor(line),
         Treatment = as.character(Treatment)) %>% 
  unite(Treatment, c("Study", "Treatment", "Sex"), sep = "_") %>% 
  filter(Genotyped == "YES") %>% 
  #unite("Treatment", c(Treatment, Sex), sep = "_") %>% 
  dplyr::select(line, Lifespan, Treatment, Vial_ID)

Line mean data

In the demographic component of this study, we calculated mean values and standard error for each combination of line, sex, study, temperature and mating status. These data are displayed, and can be downloaded from the below table. Note that for quantitative genetic, GWA and other SNP based analysis, we removed lines that had not been genotyped (shown as Genotyped = NO). Lines with unknown genotypes also have unknown Wolbachia and inversions status’. Durham et al (2014) cleared all lines of Wolbachia via treatment with tetracycline.

Show the code
genotyped_lines <- 
  read_csv("data/input/Genotyped_lines.csv") %>% 
  mutate(Genotyped = "YES",
         line = as.factor(line))
  
full_dataset <- 
  read.csv("data/input/lifespan_data.csv") %>% 
  as_tibble() %>% 
  mutate(line = as.factor(Line),
         Treatment = str_replace(Treatment, " ", "_"),
         Treatment = case_when(Temperature == 18 & Study == "Huang_2020" ~ "Huang_2020_1",
                               Temperature == 25 & Study == "Huang_2020" ~ "Huang_2020_2",
                               Temperature == 28 & Study == "Huang_2020" ~ "Huang_2020_3",
                               .default = Treatment)) %>%
  dplyr::select(-Line) %>% 
  left_join(genotyped_lines, by = "line") %>% 
  mutate(Genotyped = if_else(is.na(Genotyped), "NO", Genotyped)) %>% 
  dplyr::select(line, Sex, Temperature, Mated, Study, Treatment, Block, e0, SE_e0, h, SE_h, samp, Genotyped)

# DGRP studies often correct for the most common inversions and wolbachia presence. 

inversions_wolbachia <- 
  read_csv("data/Input/inversions_wolbachia.csv") %>%
  mutate(line = as.factor(str_remove(line, "DGRP_")),
         Wolbachia = if_else(Wolbachia == "y", 1, 0),
         across(2:17, ~ case_when(.x == "ST" ~ 0,
                                 .x == "INV/ST" ~ 1,
                                 .x == "INV" ~ 2))) %>% 
  dplyr::select(line, `In(2L)t`, `In(2R)NS`, `In(3R)P`, `In(3R)K`, `In(3R)Mo`, Wolbachia) %>% 
  rename(In_2L_t = `In(2L)t`,
         In_2R_NS = `In(2R)NS`,
         In_3R_P = `In(3R)P`,
         In_3R_K = `In(3R)K`,
         In_3R_Mo = `In(3R)Mo`)
# inversions pruned to those Huang et al 2015 PNAS corrected for

full_dataset <- 
  full_dataset %>% 
  left_join(inversions_wolbachia) %>% 
  mutate(Wolbachia = if_else(Study == "Durham_2014", 0, Wolbachia)) # study cleared wolbachia with tetracycline before phenotyping 
  
my_data_table(full_dataset %>% 
                mutate(across(8:11, ~ round(.x, 2))) %>% 
                dplyr::select(1:13))

\(\mathrm{CV}_G\)

The coefficient of genetic variation is

\[\mathrm{CV}_G = \frac{100\sqrt{\sigma^2_G}}{\overline{x}}\] where \(\sigma^2_G\) is the genetic variance in the trait of interest and \(\overline{x}\) is the mean trait value. This metric allows comparison of genetic variances between traits expressed on different scales. We use it here to get a compare the extent genetic variation in life expectancy and lifespan equality.

First, let’s calculate the \(\mathrm{CV}_G\) in life expectancy using individual-level data. While we’re at it, we can also calculate broad-sense heritability.

Show the code
# get conventional H^2 for lifespan

if(!file.exists("data/Derived/heritability/conventional_e0.csv")){
# Arya females

Arya_2010_1_Female_raw <-
  raw_data %>% 
  filter(Treatment == "Arya_2010_1_Female") #%>% 
  #mutate(Vial_ID = as.factor(rleid(Vial_ID)))

Arya_2010_1_Female_H2_model <-
    rpt(Lifespan ~ (1|line),  
        grname = c("line"),  
        data = Arya_2010_1_Female_raw, 
        datatype = "Gaussian", nboot = 1000, npermut = 0)

conventional_H2 <-
  tibble(e0_heritability = Arya_2010_1_Female_H2_model$R[[1]],
       SE = Arya_2010_1_Female_H2_model$se[1,],
       Treatment = unique(Arya_2010_1_Female_raw$Treatment))

# CVG 

Arya_2010_f_summ <- summary(Arya_2010_1_Female_H2_model$mod)

CV_G <- tibble(V_G = rnorm(4000, mean = 93.55, sd = 9.672),
       mean_trait_value = rnorm(4000, mean = 57.080, sd = 0.776)) %>% 
  mutate(CV_G = 100 * sqrt(V_G) / mean_trait_value) %>% 
  dplyr::select(V_G, CV_G) %>% 
  summarise_draws(mean, ~quantile(.x, probs = c(0.025, 0.975))) %>% 
  mutate(Treatment = unique(Arya_2010_1_Female_raw$Treatment))

# Arya males

Arya_2010_1_Male_raw <-
  raw_data %>% 
  filter(Treatment == "Arya_2010_1_Male") #%>% 
  #mutate(Vial_ID = as.factor(rleid(Vial_ID)))

Arya_2010_1_Male_H2_model <-
    rpt(Lifespan ~ (1|line),  
        grname = c("line"),  
        data = Arya_2010_1_Male_raw, 
        datatype = "Gaussian", nboot = 1000, npermut = 0)

conventional_H2 <-
  conventional_H2 %>% bind_rows(
  tibble(e0_heritability = Arya_2010_1_Male_H2_model$R[[1]],
       SE = Arya_2010_1_Male_H2_model$se[1,],
       Treatment = unique(Arya_2010_1_Male_raw$Treatment))
  )

#CVG 

Arya_2010_m_summ <- summary(Arya_2010_1_Male_H2_model$mod)

CV_G <- CV_G %>% bind_rows(tibble(V_G = rnorm(4000, mean = 98.29, sd = 9.914),
       mean_trait_value = rnorm(4000, mean = 52.9947, sd = 0.7926)) %>% 
  mutate(CV_G = 100 * sqrt(V_G) / mean_trait_value) %>% 
    dplyr::select(V_G, CV_G) %>% 
  summarise_draws(mean, ~quantile(.x, probs = c(0.025, 0.975))) %>% 
  mutate(Treatment = unique(Arya_2010_1_Male_raw$Treatment))
)

# Huang 18C females

Huang_2020_1_Female_raw <-
  raw_data %>% 
  filter(Treatment == "Huang_2020_1_Female")  #%>% 
  #mutate(Vial_ID = as.factor(rleid(Vial_ID)))

Huang_2020_1_Female_H2_model <-
    rpt(Lifespan ~ (1|line),  
        grname = c("line"),  
        data = Huang_2020_1_Female_raw, 
        datatype = "Gaussian", nboot = 1000, npermut = 0)

conventional_H2 <-
  conventional_H2 %>% bind_rows(
  tibble(e0_heritability = Huang_2020_1_Female_H2_model$R[[1]],
       SE = Huang_2020_1_Female_H2_model$se[1,],
       Treatment = unique(Huang_2020_1_Female_raw$Treatment))
  )

#CVG

Huang_2020_1_Female_summ <- summary(Huang_2020_1_Female_H2_model$mod)

CV_G <-
  CV_G %>% bind_rows(tibble(V_G = rnorm(4000, mean = 423.5, sd = 20.58),
                            mean_trait_value = rnorm(4000, mean = 79.143, sd = 1.543)) %>% 
                       mutate(CV_G = 100 * sqrt(V_G) / mean_trait_value) %>% 
                       dplyr::select(V_G, CV_G) %>% 
                       summarise_draws(mean, ~quantile(.x, probs = c(0.025, 0.975))) %>% 
                       mutate(Treatment = unique(Huang_2020_1_Female_raw$Treatment))
  )

# Huang 18C males

Huang_2020_1_Male_raw <-
  raw_data %>% 
  filter(Treatment == "Huang_2020_1_Male") #%>% 
  #mutate(Vial_ID = as.factor(rleid(Vial_ID)))

Huang_2020_1_Male_H2_model <-
    rpt(Lifespan ~ (1|line),  
        grname = c("line"),  
        data = Huang_2020_1_Male_raw, 
        datatype = "Gaussian", nboot = 1000, npermut = 0)

conventional_H2 <-
  conventional_H2 %>% bind_rows(
  tibble(e0_heritability = Huang_2020_1_Male_H2_model$R[[1]],
       SE = Huang_2020_1_Male_H2_model$se[1,],
       Treatment = unique(Huang_2020_1_Male_raw$Treatment))
  )

#CVG

Huang_2020_1_Male_summ <- summary(Huang_2020_1_Male_H2_model$mod)

CV_G <-
  CV_G %>% bind_rows(tibble(V_G = rnorm(4000, mean = 461.2, sd = 21.48),
                            mean_trait_value = rnorm(4000, mean = 86.37, sd = 1.61)) %>% 
                       mutate(CV_G = 100 * sqrt(V_G) / mean_trait_value) %>% 
                       dplyr::select(V_G, CV_G) %>% 
                       summarise_draws(mean, ~quantile(.x, probs = c(0.025, 0.975))) %>% 
                       mutate(Treatment = unique(Huang_2020_1_Male_raw$Treatment))
  )

# Huang 25C females

Huang_2020_2_Female_raw <-
  raw_data %>% 
  filter(Treatment == "Huang_2020_2_Female") #%>% 
  #mutate(Vial_ID = as.factor(rleid(Vial_ID)))

Huang_2020_2_Female_H2_model <-
    rpt(Lifespan ~ (1|line),  
        grname = c("line"),  
        data = Huang_2020_2_Female_raw, 
        datatype = "Gaussian", nboot = 1000, npermut = 0)

conventional_H2 <-
  conventional_H2 %>% bind_rows(
  tibble(e0_heritability = Huang_2020_2_Female_H2_model$R[[1]],
       SE = Huang_2020_2_Female_H2_model$se[1,],
       Treatment = unique(Huang_2020_2_Female_raw$Treatment))
  )

#CVG

Huang_2020_2_Female_summ <- summary(Huang_2020_2_Female_H2_model$mod)

CV_G <-
  CV_G %>% bind_rows(tibble(V_G = rnorm(4000, mean = 90.46, sd = 9.511),
                            mean_trait_value = rnorm(4000, mean = 42.7445, sd = 0.7069)) %>% 
                       mutate(CV_G = 100 * sqrt(V_G) / mean_trait_value) %>% 
                       dplyr::select(V_G, CV_G) %>% 
                       summarise_draws(mean, ~quantile(.x, probs = c(0.025, 0.975))) %>% 
                       mutate(Treatment = unique(Huang_2020_2_Female_raw$Treatment))
  )


# Huang 25C males

Huang_2020_2_Male_raw <-
  raw_data %>% 
  filter(Treatment == "Huang_2020_2_Male") #%>% 
  #mutate(Vial_ID = as.factor(rleid(Vial_ID)))

Huang_2020_2_Male_H2_model <-
    rpt(Lifespan ~ (1|line),  
        grname = c("line"),  
        data = Huang_2020_2_Male_raw, 
        datatype = "Gaussian", nboot = 1000, npermut = 0)

conventional_H2 <-
  conventional_H2 %>% bind_rows(
  tibble(e0_heritability = Huang_2020_2_Male_H2_model$R[[1]],
       SE = Huang_2020_2_Male_H2_model$se[1,],
       Treatment = unique(Huang_2020_2_Male_raw$Treatment))
  )

#CVG

Huang_2020_2_Male_summ <- summary(Huang_2020_2_Male_H2_model$mod)

CV_G <-
  CV_G %>% bind_rows(tibble(V_G = rnorm(4000, mean = 105.3, sd = 10.26),
                            mean_trait_value = rnorm(4000, mean = 45.2978, sd = 0.7611)) %>% 
                       mutate(CV_G = 100 * sqrt(V_G) / mean_trait_value) %>% 
                       dplyr::select(V_G, CV_G) %>% 
                       summarise_draws(mean, ~quantile(.x, probs = c(0.025, 0.975))) %>% 
                       mutate(Treatment = unique(Huang_2020_2_Male_raw$Treatment))
  )


# Huang 28C females

Huang_2020_3_Female_raw <-
  raw_data %>% 
  filter(Treatment == "Huang_2020_3_Female") #%>% 
  #mutate(Vial_ID = as.factor(rleid(Vial_ID)))

Huang_2020_3_Female_H2_model <-
    rpt(Lifespan ~ (1|line),  
        grname = c("line"),  
        data = Huang_2020_3_Female_raw, 
        datatype = "Gaussian", nboot = 1000, npermut = 0)

conventional_H2 <-
  conventional_H2 %>% bind_rows(
  tibble(e0_heritability = Huang_2020_3_Female_H2_model$R[[1]],
       SE = Huang_2020_3_Female_H2_model$se[1,],
       Treatment = unique(Huang_2020_3_Female_raw$Treatment))
  )

#CVG

Huang_2020_3_Female_summ <- summary(Huang_2020_3_Female_H2_model$mod)

CV_G <-
  CV_G %>% bind_rows(tibble(V_G = rnorm(4000, mean = 41.47, sd = 6.440),
                            mean_trait_value = rnorm(4000, mean = 28.207, sd = 0.492)) %>% 
                       mutate(CV_G = 100 * sqrt(V_G) / mean_trait_value) %>% 
                       dplyr::select(V_G, CV_G) %>% 
                       summarise_draws(mean, ~quantile(.x, probs = c(0.025, 0.975))) %>% 
                       mutate(Treatment = unique(Huang_2020_3_Female_raw$Treatment))
  )

# Huang 28C males

Huang_2020_3_Male_raw <-
  raw_data %>% 
  filter(Treatment == "Huang_2020_3_Male") #%>% 
  #mutate(Vial_ID = as.factor(rleid(Vial_ID)))

Huang_2020_3_Male_H2_model <-
    rpt(Lifespan ~ (1|line),  
        grname = c("line"),  
        data = Huang_2020_3_Male_raw, 
        datatype = "Gaussian", nboot = 1000, npermut = 0)

conventional_H2 <-
  conventional_H2 %>% bind_rows(
  tibble(e0_heritability = Huang_2020_3_Male_H2_model$R[[1]],
       SE = Huang_2020_3_Male_H2_model$se[1,],
       Treatment = unique(Huang_2020_3_Male_raw$Treatment))
  )

#CVG

Huang_2020_3_Male_summ <- summary(Huang_2020_3_Male_H2_model$mod)

CV_G <-
  CV_G %>% bind_rows(tibble(V_G = rnorm(4000, mean = 43.91, sd = 6.627),
                            mean_trait_value = rnorm(4000, mean = 27.8709, sd = 0.5054)) %>% 
                       mutate(CV_G = 100 * sqrt(V_G) / mean_trait_value) %>% 
                       dplyr::select(V_G, CV_G) %>% 
                       summarise_draws(mean, ~quantile(.x, probs = c(0.025, 0.975))) %>% 
                       mutate(Treatment = unique(Huang_2020_3_Male_raw$Treatment))
  )

# Wilson females 1

Wilson_2020_1_Female_raw <-
  raw_data %>% 
  filter(Treatment == "Wilson_2020_1_Female")

Wilson_2020_1_Female_H2_model <-
    rpt(Lifespan ~ (1|line),  
        grname = c("line"),  
        data = Wilson_2020_1_Female_raw, 
        datatype = "Gaussian", nboot = 1000, npermut = 0)

conventional_H2 <-
  conventional_H2 %>% bind_rows(
  tibble(e0_heritability = Wilson_2020_1_Female_H2_model$R[[1]],
       SE = Wilson_2020_1_Female_H2_model$se[1,],
       Treatment = unique(Wilson_2020_1_Female_raw$Treatment))
  )

#CVG

Wilson_2020_1_Female_summ <- summary(Wilson_2020_1_Female_H2_model$mod)

CV_G <-
  CV_G %>% bind_rows(tibble(V_G = rnorm(4000, mean = 97.04, sd = 9.851),
                            mean_trait_value = rnorm(4000, mean = 40.5357, sd = 0.7809)) %>% 
                       mutate(CV_G = 100 * sqrt(V_G) / mean_trait_value) %>% 
                       dplyr::select(V_G, CV_G) %>% 
                       summarise_draws(mean, ~quantile(.x, probs = c(0.025, 0.975))) %>% 
                       mutate(Treatment = unique(Wilson_2020_1_Female_raw$Treatment))
  )

# Wilson females 2

Wilson_2020_2_Female_raw <-
  raw_data %>% 
  filter(Treatment == "Wilson_2020_2_Female")

Wilson_2020_2_Female_H2_model <-
    rpt(Lifespan ~ (1|line),  
        grname = c("line"),  
        data = Wilson_2020_2_Female_raw, 
        datatype = "Gaussian", nboot = 1000, npermut = 0)

conventional_H2 <-
  conventional_H2 %>% bind_rows(
  tibble(e0_heritability = Wilson_2020_2_Female_H2_model$R[[1]],
       SE = Wilson_2020_2_Female_H2_model$se[1,],
       Treatment = unique(Wilson_2020_2_Female_raw$Treatment))
  )

#CVG

Wilson_2020_2_Female_summ <- summary(Wilson_2020_2_Female_H2_model$mod)

CV_G <-
  CV_G %>% bind_rows(tibble(V_G = rnorm(4000, mean = 69.82, sd = 8.356),
                            mean_trait_value = rnorm(4000, mean = 32.2761, sd = 0.6621)) %>% 
                       mutate(CV_G = 100 * sqrt(V_G) / mean_trait_value) %>% 
                       dplyr::select(V_G, CV_G) %>% 
                       summarise_draws(mean, ~quantile(.x, probs = c(0.025, 0.975))) %>% 
                       mutate(Treatment = unique(Wilson_2020_2_Female_raw$Treatment))
  )

# Durham females

Durham_2014_1_Female_raw <-
  raw_data %>% 
  filter(Treatment == "Durham_2014_1_Female")

Durham_2014_1_Female_H2_model <-
    rpt(Lifespan ~ (1|line),  
        grname = c("line"),  
        data = Durham_2014_1_Female_raw, 
        datatype = "Gaussian", nboot = 1000, npermut = 0)

conventional_H2 <-
  conventional_H2 %>% bind_rows(
  tibble(e0_heritability = Durham_2014_1_Female_H2_model$R[[1]],
       SE = Durham_2014_1_Female_H2_model$se[1,],
       Treatment = unique(Durham_2014_1_Female_raw$Treatment))
  )

#CVG

Durham_2014_1_Female_summ <- summary(Durham_2014_1_Female_H2_model$mod)

CV_G <-
  CV_G %>% bind_rows(tibble(V_G = rnorm(4000, mean = 76.95, sd = 8.772),
                            mean_trait_value = rnorm(4000, mean = 36.1214, sd = 0.6892)) %>% 
                       mutate(CV_G = 100 * sqrt(V_G) / mean_trait_value) %>% 
                       dplyr::select(V_G, CV_G) %>% 
                       summarise_draws(mean, ~quantile(.x, probs = c(0.025, 0.975))) %>% 
                       mutate(Treatment = unique(Durham_2014_1_Female_raw$Treatment))
  )

# Patel females

Patel_2021_1_Female_raw <-
  raw_data %>% 
  filter(Treatment == "Patel_2021_1_Female")

Patel_2021_1_Female_H2_model <-
    rpt(Lifespan ~ (1|line),  
        grname = c("line"),  
        data = Patel_2021_1_Female_raw, 
        datatype = "Gaussian", nboot = 1000, npermut = 0)

conventional_H2 <-
  conventional_H2 %>% bind_rows(
  tibble(e0_heritability = Patel_2021_1_Female_H2_model$R[[1]],
       SE = Patel_2021_1_Female_H2_model$se[1,],
       Treatment = unique(Patel_2021_1_Female_raw$Treatment))
  )

#CVG

Patel_2021_1_Female_summ <- summary(Patel_2021_1_Female_H2_model$mod)

CV_G <-
  CV_G %>% bind_rows(tibble(V_G = rnorm(4000, mean = 145.1, sd = 12.05),
                            mean_trait_value = rnorm(4000, mean = 33.0666, sd = 0.8901)) %>% 
                       mutate(CV_G = 100 * sqrt(V_G) / mean_trait_value) %>% 
                       dplyr::select(V_G, CV_G) %>% 
                       summarise_draws(mean, ~quantile(.x, probs = c(0.025, 0.975))) %>% 
                       mutate(Treatment = unique(Patel_2021_1_Female_raw$Treatment))
  )

# Dick 1 females

Dick_2011_1_Female_raw <-
  raw_data %>% 
  filter(Treatment == "Dick_2011_1_Female") #%>% 
  #mutate(Vial_ID = as.factor(rleid(Vial_ID)))

Dick_2011_1_Female_H2_model <-
    rpt(Lifespan ~ (1|line),  
        grname = c("line"),  
        data = Dick_2011_1_Female_raw, 
        datatype = "Gaussian", nboot = 1000, npermut = 0)

conventional_H2 <-
  conventional_H2 %>% bind_rows(
  tibble(e0_heritability = Dick_2011_1_Female_H2_model$R[[1]],
       SE = Dick_2011_1_Female_H2_model$se[1,],
       Treatment = unique(Dick_2011_1_Female_raw$Treatment))
  )

#CVG

Dick_2011_1_Female_summ <- summary(Dick_2011_1_Female_H2_model$mod)

CV_G <-
  CV_G %>% bind_rows(tibble(V_G = rnorm(4000, mean = 34.08, sd = 5.837),
                            mean_trait_value = rnorm(4000, mean = 29.854, sd = 1.019)) %>% 
                       mutate(CV_G = 100 * sqrt(V_G) / mean_trait_value) %>% 
                       dplyr::select(V_G, CV_G) %>% 
                       summarise_draws(mean, ~quantile(.x, probs = c(0.025, 0.975))) %>% 
                       mutate(Treatment = unique(Dick_2011_1_Female_raw$Treatment))
  )

# Dick 1 males

Dick_2011_1_Male_raw <-
  raw_data %>% 
  filter(Treatment == "Dick_2011_1_Male") #%>% 
  #mutate(Vial_ID = as.factor(rleid(Vial_ID)))

Dick_2011_1_Male_H2_model <-
    rpt(Lifespan ~ (1|line),  
        grname = c("line"),  
        data = Dick_2011_1_Male_raw, 
        datatype = "Gaussian", nboot = 1000, npermut = 0)

conventional_H2 <-
  conventional_H2 %>% bind_rows(
  tibble(e0_heritability = Dick_2011_1_Male_H2_model$R[[1]],
       SE = Dick_2011_1_Male_H2_model$se[1,],
       Treatment = unique(Dick_2011_1_Male_raw$Treatment))
  )

#CVG

Dick_2011_1_Male_summ <- summary(Dick_2011_1_Male_H2_model$mod)

CV_G <-
  CV_G %>% bind_rows(tibble(V_G = rnorm(4000, mean = 41.15, sd = 6.415),
                            mean_trait_value = rnorm(4000, mean = 27.834, sd = 1.113)) %>% 
                       mutate(CV_G = 100 * sqrt(V_G) / mean_trait_value) %>% 
                       dplyr::select(V_G, CV_G) %>% 
                       summarise_draws(mean, ~quantile(.x, probs = c(0.025, 0.975))) %>% 
                       mutate(Treatment = unique(Dick_2011_1_Male_raw$Treatment))
  )

# Dick 2 females

Dick_2011_2_Female_raw <-
  raw_data %>% 
  filter(Treatment == "Dick_2011_2_Female") #%>% 
  #mutate(Vial_ID = as.factor(rleid(Vial_ID)))

Dick_2011_2_Female_H2_model <-
    rpt(Lifespan ~ (1|line),  
        grname = c("line"),  
        data = Dick_2011_2_Female_raw, 
        datatype = "Gaussian", nboot = 1000, npermut = 0)

conventional_H2 <-
  conventional_H2 %>% bind_rows(
  tibble(e0_heritability = Dick_2011_2_Female_H2_model$R[[1]],
       SE = Dick_2011_2_Female_H2_model$se[1,],
       Treatment = unique(Dick_2011_2_Female_raw$Treatment))
  )

#CVG

Dick_2011_2_Female_summ <- summary(Dick_2011_2_Female_H2_model$mod)

CV_G <-
  CV_G %>% bind_rows(tibble(V_G = rnorm(4000, mean = 26.47, sd = 5.145),
                            mean_trait_value = rnorm(4000, mean = 23.1871, sd = 0.8972)) %>% 
                       mutate(CV_G = 100 * sqrt(V_G) / mean_trait_value) %>% 
                       dplyr::select(V_G, CV_G) %>% 
                       summarise_draws(mean, ~quantile(.x, probs = c(0.025, 0.975))) %>% 
                       mutate(Treatment = unique(Dick_2011_2_Female_raw$Treatment))
  )

# Dick 2 males

Dick_2011_2_Male_raw <-
  raw_data %>% 
  filter(Treatment == "Dick_2011_2_Male") #%>% 
  #mutate(Vial_ID = as.factor(rleid(Vial_ID)))

Dick_2011_2_Male_H2_model <-
    rpt(Lifespan ~ (1|line),  
        grname = c("line"),  
        data = Dick_2011_2_Male_raw, 
        datatype = "Gaussian", nboot = 1000, npermut = 0)

conventional_H2 <-
  conventional_H2 %>% bind_rows(
  tibble(e0_heritability = Dick_2011_2_Male_H2_model$R[[1]],
       SE = Dick_2011_2_Male_H2_model$se[1,],
       Treatment = unique(Dick_2011_2_Male_raw$Treatment))
  )

#CVG

Dick_2011_2_Male_summ <- summary(Dick_2011_2_Male_H2_model$mod)

CV_G <-
  CV_G %>% bind_rows(tibble(V_G = rnorm(4000, mean = 23.33, sd = 4.831),
                            mean_trait_value = rnorm(4000, mean = 19.9551, sd = 0.8401)) %>% 
                       mutate(CV_G = 100 * sqrt(V_G) / mean_trait_value) %>% 
                       dplyr::select(V_G, CV_G) %>% 
                       summarise_draws(mean, ~quantile(.x, probs = c(0.025, 0.975))) %>% 
                       mutate(Treatment = unique(Dick_2011_2_Male_raw$Treatment))
  )

# Dick 3 females

Dick_2011_3_Female_raw <-
  raw_data %>% 
  filter(Treatment == "Dick_2011_3_Female")  #%>% 
  #mutate(Vial_ID = as.factor(rleid(Vial_ID)))

Dick_2011_3_Female_H2_model <-
    rpt(Lifespan ~ (1|line),  
        grname = c("line"),  
        data = Dick_2011_3_Female_raw, 
        datatype = "Gaussian", nboot = 1000, npermut = 0)

conventional_H2 <-
  conventional_H2 %>% bind_rows(
  tibble(e0_heritability = Dick_2011_3_Female_H2_model$R[[1]],
       SE = Dick_2011_3_Female_H2_model$se[1,],
       Treatment = unique(Dick_2011_3_Female_raw$Treatment))
  )

#CVG

Dick_2011_3_Female_summ <- summary(Dick_2011_3_Female_H2_model$mod)

CV_G <-
  CV_G %>% bind_rows(tibble(V_G = rexp(4000, rate = 1/5.039), # note the use of rexp instead of rnorm to avoid neg variance values
                            mean_trait_value = rnorm(4000, mean = 30.4241, sd = 0.8665)) %>% 
                       mutate(CV_G = 100 * sqrt(V_G) / mean_trait_value) %>% 
                       dplyr::select(V_G, CV_G) %>% 
                       summarise_draws(mean, ~quantile(.x, probs = c(0.025, 0.975))) %>% 
                       mutate(Treatment = unique(Dick_2011_3_Female_raw$Treatment))
  )

# Dick 3 males

Dick_2011_3_Male_raw <-
  raw_data %>% 
  filter(Treatment == "Dick_2011_3_Male")  #%>% 
  #mutate(Vial_ID = as.factor(rleid(Vial_ID)))

Dick_2011_3_Male_H2_model <-
    rpt(Lifespan ~ (1|line),  
        grname = c("line"),  
        data = Dick_2011_3_Male_raw, 
        datatype = "Gaussian", nboot = 1000, npermut = 0)

conventional_H2 <-
  conventional_H2 %>% bind_rows(
  tibble(e0_heritability = Dick_2011_3_Male_H2_model$R[[1]],
       SE = Dick_2011_3_Male_H2_model$se[1,],
       Treatment = unique(Dick_2011_3_Male_raw$Treatment))
  )

#CVG

Dick_2011_3_Male_summ <- summary(Dick_2011_3_Male_H2_model$mod)

CV_G <-
  CV_G %>% bind_rows(tibble(V_G = rnorm(4000, mean = 40.45, sd = 6.360),
                            mean_trait_value = rnorm(4000, mean = 27.75, sd = 2.15)) %>% 
                       mutate(CV_G = 100 * sqrt(V_G) / mean_trait_value) %>% 
                       dplyr::select(V_G, CV_G) %>% 
                       summarise_draws(mean, ~quantile(.x, probs = c(0.025, 0.975))) %>% 
                       mutate(Treatment = unique(Dick_2011_3_Male_raw$Treatment))
  )

# Hoffman 1 females

Hoffman_2021_1_Female_raw <-
  raw_data %>% 
  filter(Treatment == "Hoffman_2021_1_Female") #%>% 
  #mutate(Vial_ID = as.factor(rleid(Vial_ID)))

Hoffman_2021_1_Female_H2_model <-
    rpt(Lifespan ~ (1|line),  
        grname = c("line"),  
        data = Hoffman_2021_1_Female_raw, 
        datatype = "Gaussian", nboot = 1000, npermut = 0)

conventional_H2 <-
  conventional_H2 %>% bind_rows(
  tibble(e0_heritability = Hoffman_2021_1_Female_H2_model$R[[1]],
       SE = Hoffman_2021_1_Female_H2_model$se[1,],
       Treatment = unique(Hoffman_2021_1_Female_raw$Treatment))
  )

#CVG

Hoffman_2021_1_Female_summ <- summary(Hoffman_2021_1_Female_H2_model$mod)

CV_G <-
  CV_G %>% bind_rows(tibble(V_G = rnorm(4000, mean = 162.1, sd = 12.73),
                            mean_trait_value = rnorm(4000, mean = 53.241, sd = 3.208)) %>% 
                       mutate(CV_G = 100 * sqrt(V_G) / mean_trait_value) %>% 
                       dplyr::select(V_G, CV_G) %>% 
                       summarise_draws(mean, ~quantile(.x, probs = c(0.025, 0.975))) %>% 
                       mutate(Treatment = unique(Hoffman_2021_1_Female_raw$Treatment))
  )

# Hoffman 1 males

Hoffman_2021_1_Male_raw <-
  raw_data %>% 
  filter(Treatment == "Hoffman_2021_1_Male") #%>% 
  #mutate(Vial_ID = as.factor(rleid(Vial_ID)))

Hoffman_2021_1_Male_H2_model <-
    rpt(Lifespan ~ (1|line),  
        grname = c("line"),  
        data = Hoffman_2021_1_Male_raw, 
        datatype = "Gaussian", nboot = 1000, npermut = 0)

conventional_H2 <-
  conventional_H2 %>% bind_rows(
  tibble(e0_heritability = Hoffman_2021_1_Male_H2_model$R[[1]],
       SE = Hoffman_2021_1_Male_H2_model$se[1,],
       Treatment = unique(Hoffman_2021_1_Male_raw$Treatment))
  )

#CVG

Hoffman_2021_1_Male_summ <- summary(Hoffman_2021_1_Male_H2_model$mod)

CV_G <-
  CV_G %>% bind_rows(tibble(V_G = rnorm(4000, mean = 195.8, sd = 13.99),
                            mean_trait_value = rnorm(4000, mean = 52.761, sd = 3.516)) %>% 
                       mutate(CV_G = 100 * sqrt(V_G) / mean_trait_value) %>% 
                       dplyr::select(V_G, CV_G) %>% 
                       summarise_draws(mean, ~quantile(.x, probs = c(0.025, 0.975))) %>% 
                       mutate(Treatment = unique(Hoffman_2021_1_Male_raw$Treatment))
  )

# Hoffman 2 females

Hoffman_2021_2_Female_raw <-
  raw_data %>% 
  filter(Treatment == "Hoffman_2021_2_Female") #%>% 
  #mutate(Vial_ID = as.factor(rleid(Vial_ID)))

Hoffman_2021_2_Female_H2_model <-
    rpt(Lifespan ~ (1|line),  
        grname = c("line"),  
        data = Hoffman_2021_2_Female_raw, 
        datatype = "Gaussian", nboot = 1000, npermut = 0)

conventional_H2 <-
  conventional_H2 %>% bind_rows(
  tibble(e0_heritability = Hoffman_2021_2_Female_H2_model$R[[1]],
       SE = Hoffman_2021_2_Female_H2_model$se[1,],
       Treatment = unique(Hoffman_2021_2_Female_raw$Treatment))
  )

#CVG

Hoffman_2021_2_Female_summ <- summary(Hoffman_2021_2_Female_H2_model$mod)

CV_G <-
  CV_G %>% bind_rows(tibble(V_G = rnorm(4000, mean = 223.9, sd = 14.96),
                            mean_trait_value = rnorm(4000, mean = 55.492, sd = 4.333)) %>% 
                       mutate(CV_G = 100 * sqrt(V_G) / mean_trait_value) %>% 
                       dplyr::select(V_G, CV_G) %>% 
                       summarise_draws(mean, ~quantile(.x, probs = c(0.025, 0.975))) %>% 
                       mutate(Treatment = unique(Hoffman_2021_2_Female_raw$Treatment))
  )

# Hoffman 2 males

Hoffman_2021_2_Male_raw <-
  raw_data %>% 
  filter(Treatment == "Hoffman_2021_2_Male") #%>% 
  #mutate(Vial_ID = as.factor(rleid(Vial_ID)))

Hoffman_2021_2_Male_H2_model <-
    rpt(Lifespan ~ (1|line),  
        grname = c("line"),  
        data = Hoffman_2021_2_Male_raw, 
        datatype = "Gaussian", nboot = 1000, npermut = 0)

conventional_H2 <-
  conventional_H2 %>% bind_rows(
  tibble(e0_heritability = Hoffman_2021_2_Male_H2_model$R[[1]],
       SE = Hoffman_2021_2_Male_H2_model$se[1,],
       Treatment = unique(Hoffman_2021_2_Male_raw$Treatment))
  )

#CVG

Hoffman_2021_2_Male_summ <- summary(Hoffman_2021_2_Male_H2_model$mod)

CV_G <-
  CV_G %>% bind_rows(tibble(V_G = rnorm(4000, mean = 233.8, sd = 15.29),
                            mean_trait_value = rnorm(4000, mean = 59.240, sd = 4.428)) %>% 
                       mutate(CV_G = 100 * sqrt(V_G) / mean_trait_value) %>% 
                       dplyr::select(V_G, CV_G) %>% 
                       summarise_draws(mean, ~quantile(.x, probs = c(0.025, 0.975))) %>% 
                       mutate(Treatment = unique(Hoffman_2021_2_Male_raw$Treatment))
  )

# Zhao 1 females

Zhao_2022_1_Female_raw <-
  raw_data %>% 
  filter(Treatment == "Zhao_2022_1_Female") #%>% 
  #mutate(Vial_ID = as.factor(rleid(Vial_ID)))

Zhao_2022_1_Female_H2_model <-
    rpt(Lifespan ~ (1|line),  
        grname = c("line"),  
        data = Zhao_2022_1_Female_raw, 
        datatype = "Gaussian", nboot = 1000, npermut = 0)

conventional_H2 <-
  conventional_H2 %>% bind_rows(
  tibble(e0_heritability = Zhao_2022_1_Female_H2_model$R[[1]],
       SE = Zhao_2022_1_Female_H2_model$se[1,],
       Treatment = unique(Zhao_2022_1_Female_raw$Treatment))
  )

#CVG

Zhao_2022_1_Female_summ <- summary(Zhao_2022_1_Female_H2_model$mod)

CV_G <-
  CV_G %>% bind_rows(tibble(V_G = rnorm(4000, mean = 200.51, sd = 14.160),
                            mean_trait_value = rnorm(4000, mean = 58.362, sd = 3.172)) %>% 
                       mutate(CV_G = 100 * sqrt(V_G) / mean_trait_value) %>% 
                       dplyr::select(V_G, CV_G) %>% 
                       summarise_draws(mean, ~quantile(.x, probs = c(0.025, 0.975))) %>% 
                       mutate(Treatment = unique(Zhao_2022_1_Female_raw$Treatment))
  )

write_csv(conventional_H2, "data/Derived/heritability/conventional_e0.csv")
write_csv(CV_G, "data/Derived/heritability/conventional_CVG.csv")
} else {
  conventional_H2 <- read_delim("data/Derived/heritability/conventional_e0.csv")
  CV_G <- read_delim("data/Derived/heritability/conventional_CVG.csv")}

We can also calculate \(\mathrm{CV}_G\) directly from line (genotype) means.

Show the code
CVG_data <-
  full_dataset %>% 
  unite(Treatment, c("Treatment", "Sex"), sep = "_") %>% 
  filter(Genotyped == "YES")

# make a function to update the model and the posterior sample output with the desired trait

CVG_e0_calculator <- function(selected_treatment){
  
  data <- CVG_data %>% filter(Treatment == selected_treatment)
  
  model <- update(e0_VG_model, newdata = data)
  
  posterior <- 
    as_draws_df(model) %>%
    dplyr::select(b_Intercept, sigma) %>% 
    mutate(VG = sigma^2,
           CVG = 100 * sqrt(VG) / abs(b_Intercept)) %>%  # Houle 1992
    mutate(Trait = "e0",
           Treatment = selected_treatment)
  
  posterior
}

CVG_h_calculator <- function(selected_treatment){
  
  data <- CVG_data %>% filter(Treatment == selected_treatment) %>% filter(!is.na(SE_h))
  
  model <- update(h_VG_model, newdata = data)
  
  posterior <- 
    as_draws_df(model) %>%
    dplyr::select(b_Intercept, sigma) %>% 
    mutate(VG = sigma^2,
           CVG = 100 * sqrt(VG) / abs(b_Intercept)) %>%  # Houle 1992
    mutate(Trait = "h",
           Treatment = selected_treatment)
  
  posterior
}

treatment_list <- unique(CVG_data$Treatment)

# Run the models

Run_function <- FALSE # Change this to TRUE to run the models

if(Run_function){
  
d <- CVG_data %>% filter(Treatment == "Arya_2010_1_Female")

e0_VG_model <-
  brm(data = d,
      family = gaussian(),
      e0 | mi(SE_e0) ~ 1,
      chains = 4, cores = 4, 
      seed = 1, iter = 6000, warmup = 2000)

h_VG_model <-
  brm(data = d,
      family = gaussian(),
      h | mi(SE_h) ~ 1,
      chains = 4, cores = 4, 
      seed = 1, iter = 6000, warmup = 2000)  
  
CVG_data_e0 <- map_dfr(treatment_list, CVG_e0_calculator)
CVG_data_h <- map_dfr(treatment_list, CVG_h_calculator)

CVG_data <- bind_rows(CVG_data_e0, 
                      CVG_data_h)
  
  CVG_data %>% 
    write_csv("data/Derived/heritability/CVG_data.csv")
} else {
  CVG_data <- read_csv("data/Derived/heritability/CVG_data.csv")
}

CVG_summarised <-
  CVG_data %>% 
  group_by(Trait, Treatment) %>% 
  summarise_draws(mean, sd, ~quantile(.x, probs = c(0.025, 0.975))) %>%
  ungroup() %>% 
  mutate(across(4:7, ~round(.x, 1))) %>% 
  pivot_wider(names_from = "variable", values_from = 4:7)

How do the results of the two methods compare?

Show the code
CVG_comparison <-
  CV_G %>% filter(variable == "CV_G") %>% 
  rename(mean_CVG_conventional = mean,
         `2.5%_CVG conventional` = `2.5%`,
         `97.5%_CVG conventional` = `97.5%`) %>% 
  dplyr::select(-variable) %>% 
  left_join(
    
    CVG_summarised %>% 
      filter(Trait == "e0") %>% 
      rename(line_mean_CVG = mean_CVG,
             `line 2.5%_CVG` = `2.5%_CVG`,
             `line 97.5%_CVG` = `97.5%_CVG`) %>% 
      dplyr::select(Treatment, line_mean_CVG, `line 2.5%_CVG`, `line 97.5%_CVG`)
  )

CVG_comparison %>% 
  ggplot(aes(x = mean_CVG_conventional, y = line_mean_CVG)) +
  geom_abline(intercept = 0, slope = 1, linetype =2) +
  geom_point(size = 2.5) +
  scale_x_continuous(limits = c(0, 40), expand = c(0, 0)) +
  scale_y_continuous(limits = c(0, 40), expand = c(0, 0)) +
  labs(x = "Life expectancy CVG estimated from individual data",
       y = "Life expectancy CVG estimated from line mean data") +
  theme_bw() +
  theme(text = element_text(size = 12))

\(\mathrm{CV}_G\) in life expectancy calculated from individual level data is \(\approx\) \(\mathrm{CV}_G\) calculated from line means. We therefore treat these line mean estimates as reasonable and use them to compare life expectancy and lifespan equality.

Table SX. \(CV_G\) estimates for life expectancy and lifespan equality, estimated from genotype means.

Show the code
CVG_summarised %>% 
  dplyr::select(Trait, Treatment, mean_CVG, `2.5%_CVG`, `97.5%_CVG`) %>% 
  pivot_wider(names_from = Trait, values_from = 3:5) %>% 
  dplyr::select(Treatment, mean_CVG_e0, `2.5%_CVG_e0`, `97.5%_CVG_e0`, mean_CVG_h,
                `2.5%_CVG_h`, `97.5%_CVG_h`) %>%
  rename(`Life expectancy CVG` = mean_CVG_e0,
         `Life expectancy 2.5%`= `2.5%_CVG_e0`,
         `Life expectancy 97.5%`= `97.5%_CVG_e0`,
         `Lifespan equality CVG` = mean_CVG_h,
         `Lifespan equality 2.5%`= `2.5%_CVG_h`,
         `Lifespan equality 97.5%`= `97.5%_CVG_h`) %>% 
  kable() %>% 
  kable_styling()
Treatment Life expectancy CVG Life expectancy 2.5% Life expectancy 97.5% Lifespan equality CVG Lifespan equality 2.5% Lifespan equality 97.5%
Arya_2010_1_Female 17.2 15.3 19.3 17.7 15.2 20.5
Arya_2010_1_Male 18.9 16.8 21.2 19.6 17.0 22.6
Dick_2011_1_Female 23.1 19.1 28.0 20.4 16.0 25.7
Dick_2011_1_Male 26.9 22.3 32.6 22.0 17.5 27.8
Dick_2011_2_Female 23.9 19.7 29.2 23.5 18.8 29.5
Dick_2011_2_Male 26.3 21.8 32.1 28.6 22.7 36.2
Dick_2011_3_Female 11.3 5.9 20.9 11.7 2.5 26.6
Dick_2011_3_Male 26.1 15.7 45.1 23.7 12.5 45.8
Durham_2014_1_Female 26.1 24.2 28.1 20.0 17.2 23.1
Hoffman_2021_1_Female 23.3 18.0 30.9 21.8 15.8 30.2
Hoffman_2021_1_Male 27.7 21.4 36.6 14.6 10.2 20.7
Hoffman_2021_2_Female 26.8 20.1 36.6 16.5 11.5 23.8
Hoffman_2021_2_Male 26.3 19.5 36.1 16.5 11.4 23.7
Huang_2020_1_Female 26.2 23.5 29.4 25.8 22.8 29.3
Huang_2020_1_Male 25.0 22.4 28.0 23.8 21.2 26.9
Huang_2020_2_Female 22.4 20.0 25.0 20.5 18.1 23.1
Huang_2020_2_Male 22.8 20.4 25.5 18.1 16.0 20.4
Huang_2020_3_Female 23.0 20.6 25.9 27.1 23.9 30.7
Huang_2020_3_Male 24.0 21.4 26.9 26.5 23.4 30.0
Patel_2021_1_Female 36.4 32.4 40.9 29.4 25.7 33.7
Wilson_2020_1_Female 25.2 22.9 27.9 28.2 25.4 31.3
Wilson_2020_2_Female 27.8 25.1 30.8 23.6 21.2 26.2
Zhao_2022_1_Female 24.8 18.0 34.7 11.9 7.9 17.7

Preparing for univariate GWAS

The preparation of data for univariate GWAS generally follows Holman and Wong’s (2023) DGRP GWAS of fitness in different contexts. See their associated workflowr report for a comprehensive breakdown of their data preparation.

Loading data used in GWA tests

For GWAS and later CPASSOC, we split the data by study, removed studies that phenotyped < 100 lines and adjusted line means to account for experimental block where applicable. Importantly, we also split the Wilson et al (2020) data by dietary treatment; while we do not explicitly consider diet in our analysis, lifespan in one dietary treatment is considered a separate trait from lifespan measured in a second dietary treatment.

Show the code
Arya_2010_f <-
  full_dataset %>% 
  filter(Study == "Arya_2010" & Sex == "Female" & Genotyped == "YES") %>% 
  mutate(e0_scaled = scale(e0),
         h_scaled = scale(h)) %>% 
  dplyr::select(line, Sex, Temperature, Mated, Treatment, e0, e0_scaled, h, h_scaled)

Arya_2010_m <-
  full_dataset %>% 
  filter(Study == "Arya_2010" & Sex == "Male" & Genotyped == "YES") %>% 
  mutate(e0_scaled = scale(e0),
         h_scaled = scale(h)) %>% 
  dplyr::select(line, Sex, Temperature, Mated, Treatment, e0, e0_scaled, h, h_scaled)

Huang_2020_f_18 <-
  full_dataset %>% 
  filter(Study == "Huang_2020" & Sex == "Female" & Temperature == 18 & Genotyped == "YES") %>% 
  mutate(e0_scaled = scale(e0),
         h_scaled = scale(h)) %>% 
  dplyr::select(line, Sex, Temperature, Mated, Treatment, e0, e0_scaled, h, h_scaled)

Huang_2020_m_18 <-
  full_dataset %>% 
  filter(Study == "Huang_2020" & Sex == "Male" & Temperature == 18 & Genotyped == "YES") %>% 
  mutate(e0_scaled = scale(e0),
         h_scaled = scale(h)) %>% 
  dplyr::select(line, Sex, Temperature, Mated, Treatment, e0, e0_scaled, h, h_scaled)

Huang_2020_f_25 <-
  full_dataset %>% 
  filter(Study == "Huang_2020" & Sex == "Female" & Temperature == 25 & Genotyped == "YES") %>% 
  mutate(e0_scaled = scale(e0),
         h_scaled = scale(h)) %>% 
  dplyr::select(line, Sex, Temperature, Mated, Treatment, e0, e0_scaled, h, h_scaled)

Huang_2020_m_25 <-
  full_dataset %>% 
  filter(Study == "Huang_2020" & Sex == "Male" & Temperature == 25 & Genotyped == "YES") %>% 
  mutate(e0_scaled = scale(e0),
         h_scaled = scale(h)) %>% 
  dplyr::select(line, Sex, Temperature, Mated, Treatment, e0, e0_scaled, h, h_scaled)

Huang_2020_f_28 <-
  full_dataset %>% 
  filter(Study == "Huang_2020" & Sex == "Female" & Temperature == 28 & Genotyped == "YES") %>% 
  mutate(e0_scaled = scale(e0),
         h_scaled = scale(h)) %>% 
  dplyr::select(line, Sex, Temperature, Mated, Treatment, e0, e0_scaled, h, h_scaled)

Huang_2020_m_28 <-
  full_dataset %>% 
  filter(Study == "Huang_2020" & Sex == "Male" & Temperature == 28 & Genotyped == "YES") %>% 
  mutate(e0_scaled = scale(e0),
         h_scaled = scale(h)) %>% 
  dplyr::select(line, Sex, Temperature, Mated, Treatment, e0, e0_scaled, h, h_scaled)

# In this study, some lines were measured twice per treatment, and a small subset were measured three times. We take the mean across blocks as the line mean, following the original study.

Wilson_2020_f_1 <-
  full_dataset %>% 
  filter(Treatment == "Wilson_2020_1" & Genotyped == "YES") %>%
  group_by(line) %>% 
  mutate(e0 = mean(e0),
         h = mean(h)) %>% 
  ungroup() %>% 
  distinct(line, .keep_all = TRUE) %>%
  mutate(e0_scaled = scale(e0),
         h_scaled = scale(h)) %>% 
  dplyr::select(line, Sex, Temperature, Mated, Treatment, e0, e0_scaled, h, h_scaled)

Wilson_2020_f_2 <-
  full_dataset %>% 
  filter(Treatment == "Wilson_2020_2" & Genotyped == "YES") %>% 
  group_by(line) %>% 
  mutate(e0 = mean(e0),
         h = mean(h)) %>% 
  ungroup() %>% 
  distinct(line, .keep_all = TRUE) %>%
  mutate(e0_scaled = scale(e0),
         h_scaled = scale(h)) %>% 
  dplyr::select(line, Sex, Temperature, Mated, Treatment, e0, e0_scaled, h, h_scaled)

# In this study, each line was measured three times. We take the mean across blocks as the line mean

Durham_2014_f <-
  full_dataset %>% 
  filter(Study == "Durham_2014" & Genotyped == "YES") %>% 
  group_by(line) %>% 
  mutate(e0 = mean(e0),
         h = mean(h)) %>% 
  ungroup() %>% 
  distinct(line, .keep_all = TRUE) %>%
  mutate(e0_scaled = scale(e0),
         h_scaled = scale(h)) %>% 
  dplyr::select(line, Sex, Temperature, Mated, Treatment, e0, e0_scaled, h, h_scaled)


Patel_2021_f <-
  full_dataset %>% 
  filter(Study == "Patel_2021" & Genotyped == "YES") %>% 
  mutate(e0_scaled = scale(e0),
         h_scaled = scale(h)) %>% 
  dplyr::select(line, Sex, Temperature, Mated, Treatment, e0, e0_scaled, h, h_scaled)

Install neccessary software and build helper functions

In addition to the R packages we load, plink 1.9 and beagle must also be installed. These software packages allow us to wrangle the data into the correct format and impute missing values, both of which are required to run GWAS with the plink.

plink is run from the terminal, but we pass the terminal command to R first, which then writes to the terminal. This makes our analysis reproducible. However, Windows and mac operating systems liase with the terminal differently, meaning different functions are required depending on your operating system. To make this easy we include the following code chunk, where you can specify whether you’re a windows or mac user.

Show the code
Operating_system <- "mac" # change this to "windows" if appropriate. Note that all downstream functions are informed by this
Show the code
# build functions to prepare data for GWAS

prep_for_e0_GWAS <- function(data, sex){
data %>% 
  mutate(line = glue("line{line}")) %>% 
  dplyr::select(line, e0)
}

prep_for_h_GWAS <- function(data, sex){
data %>% 
  mutate(line = glue("line{line}")) %>% 
  dplyr::select(line, h)
}

prep_for_ageing_GWAS <- function(data){
  data %>%
    mutate(line = glue("line{line}")) %>% 
    dplyr::select(line, ageing_axis_centered)
}

prep_for_baseline_mortality_GWAS <- function(data){
  data %>%
    mutate(line = glue("line{line}")) %>% 
    dplyr::select(line, baseline_mortality_axis_centered)
}

# I used bigsnpr::download_plink(dir = "code/windows") which puts it in the code folder - I'm using a windows operating system. The macOS version can also be downloaded into "code/macOS" 

# Beagle is a software package for phasing genotypes and imputing ungenotyped markers.
if(Operating_system == "mac"){plink <- paste(getwd(), "code/macOS/plink", sep = "/")}
if(Operating_system == "windows"){plink <- paste(getwd(), "code/windows/plink", sep = "/")}

# only need to download this once - change path depending on operating system
#beagle <- bigsnpr::download_beagle(
 #   dir = "/Users/tkeaney/Library/CloudStorage/OneDrive-JGU/Postdoc/DGRP_lifespan/DGRP_lifespan_inequality/code/macOS") 

# helper function to pass commands to the terminal
# Note that we set `intern = TRUE`, and pass the result of `system()` to `cat()`, 
# ensuring that the Terminal output will be printed in this knitr report.
# 
# This is the mac OS function
if(Operating_system == "mac"){
  run_command_mac <- function(shell_command, wd = getwd(), path = ""){
    cat(system(glue("cd ", wd, path, # tell terminal which directory to work in
                    "\n",shell_command), # on a second terminal line, run the plink command
               intern = TRUE), sep = '\n')  
  }
}

# This is the windows function 
if(Operating_system == "windows"){
  run_command_windows <- function(plink_command, wd = getwd(), path = "") {
    # Specify the full path to the plink executable within the 'code' subdirectory.
    plink_path <- paste(getwd(), "code/windows/plink", sep = "/")
    
    # Create the full shell command with the plink executable.
    command <- glue("cmd.exe /c cd /d {shQuote(file.path(wd, path))} && {shQuote(plink_path)} {plink_command}")
    
    # Execute the combined command.
    result <- system(command, intern = TRUE)
    
    # Print the result.
    cat(result, sep = '\n')
    
    # Return the result as a character vector.
    return(result)
  }
  
  # sometimes we need to run terminal commands without plink - for windows, create a slightly different function to do this
  
  run_command_no_plink <- function(shell_command, wd = getwd(), path = "") {
    
    # Create the full shell command with the plink executable.
    command <- glue("cmd.exe /c cd /d {shQuote(file.path(wd, path))} && {shell_command}")
    
    # Execute the combined command.
    result <- system(command, intern = TRUE)
    
    # Print the result.
    cat(result, sep = '\n')
    
    # Return the result as a character vector.
    return(result)
  }
}

Perform SNP quality control and impute missing data

Plink recognises three types of files that are necessary for GWA analysis: the .bed, .bim and .fam files.

.bed: the binary biallelic genotype table. Four options are possible:

  • 00 = homozygous for minor allele
  • 01 = missing genotype
  • 10 = heterozygous genotype
  • 11 = homozygous for major allele

The overwhelming majority of genotypes in the DGRP are homozygous for one of the alleles (i.e. 00 or 11).

.bim: extended variant information file accompanying the .bed file. It has six fields:

  1. chromosome code

  2. variant identifier

  3. position in morgans

  4. base-pair coordinate

  5. Minor allele

  6. Major allele

.fam: Plink sample information file. It can have the following elements:

  1. Family ID (‘FID’) (in our case just the DGRP line)

  2. Within-family ID (‘IID’; cannot be ‘0’) (in our case just the DGRP line)

  3. Within-family ID of father (‘0’ if father isn’t in dataset)

  4. Within-family ID of mother (‘0’ if mother isn’t in dataset)

  5. Sex code (‘1’ = male, ‘2’ = female, ‘0’ = unknown) - not important for us because we analyse the sexes separately.

  6. Phenotype value (‘1’ = control, ‘2’ = case, ‘-9’/‘0’/non-numeric = missing data): -9 for us because we supply the phenotypic data later.

We cleaned up the DGRP’s .bed/.bim/.fam files (available from the Mackay lab website) by:

  1. Removing any SNPs for which genotypes are missing in >10% of the 205 DGRP lines. We then use the software Beagle to impute the remaining missing genotypes. Imputation takes about half an hour, so ideally you only want to do it once.

  2. Removing SNPs with a minor allele frequency of less than 5% across the 205 lines. We have negligible power to detect associations for rare SNPs that occur at frequencies below this threshold.

In the plink-formatted genotype files, lines fixed for the major allele are coded as 2, and lines fixed for the minor allele as 0. SNPs with negative \(\beta\) coefficients therefore indicate that the minor allele is associated with higher trait values, while positive effect sizes means that the minor allele is associated with lower trait values.

Show the code
Run_function <- FALSE # Change this to TRUE to run - read through the code before you do this 

if(Run_function){
  
  # Use Plink to clean and subset the DGRP's SNP data as follows:
  # Only keep SNPs for which at least 90% of the 205 DGRP lines were successfully genotyped (--geno 0.1)
  # Only keep SNPs with a minor allele frequency of 0.05 or higher (--maf 0.05) across the 205 lines
  # Write the processed BIM/BED/FAM files to the data/Derived/plink_output directory
  
  output_directory <-  paste(getwd(), "data/Derived/plink_output", sep = "/")
  
  if(Operating_system == "windows"){
    run_command_windows(glue("--bfile dgrp2",
                             " --geno 0.1 --maf 0.05 --allow-no-sex", 
                             " --make-bed --out {shQuote(output_directory)}/dgrp2_QC_all_lines"), path = "data/Input/bfiles/")
  }
  
  if(Operating_system == "mac"){
    run_command_mac(glue("{plink} --bfile dgrp2",
                         " --geno 0.1 --maf 0.05 --allow-no-sex", 
                         " --make-bed --out ../dgrp2_QC_all_lines"), path = "/data/input/bfiles/")
  }
  # Use the shell command 'sed' to remove underscores from the DGRP line names in the .fam file (e.g. 'line_120' becomes 'line120')
  # Otherwise, these underscores cause trouble when we need to convert from PLINK to vcf format (vcf format uses underscore as a separator)
  if(Operating_system == "windows"){  
    for(i in 1:2) run_command_no_plink("sed -i '' 's/_//' dgrp2_QC_all_lines.fam", path = "/data/Derived/")
  }
  
  if(Operating_system == "mac"){            
    for(i in 1:2) run_command_mac("sed -i '' 's/_//' dgrp2_QC_all_lines.fam", path = "/data/Derived/")
  }
  # Now impute the missing genotypes using Beagle
  # This part uses the data for the full DGRP panel of >200 lines, to infer missing genotypes as accurately as possible. 
  # The bigsnpr package provides a helpful wrapper for Beagle called snp_beagleImpute(): it translates to a VCF file and back again using PLINK
  # Imputation with the below optimisation took about 25 mins on my computer, which is a high spec macbook by 2025 standards
  snp_beagleImpute(beagle, plink, 
                   bedfile.in = "data/Derived/plink_output/dgrp2_QC_all_lines.bed", 
                   bedfile.out = "data/Derived/plink_output/dgrp2_QC_all_lines_imputed.bed",
                   ncores = 10, 
                   memory.max = 32)
  
  # assign a sex of 'female' to all the DGRP lines (Beagle removes the sex, and it seems PLINK needs individuals to have a sex, 
  # despite PLINK having a command called --allow-no-sex)
  
  if(Operating_system == "windows"){ 
    run_command_windows("sed -i '' 's/    0   0   0/  0   0   2/' dgrp2_QC_all_lines_imputed.fam", 
                        path = "/data/Derived/plink_output/")
  }
  if(Operating_system == "mac"){ 
    run_command_mac("sed -i '' 's/    0   0   0/  0   0   2/' dgrp2_QC_all_lines_imputed.fam", 
                    path = "/data/Derived/plink_output/")
  }
  # Re-write the .bed file, to make sure the MAF threshold and minor/major allele designations are correctly assigned post-Beagle
  
  if(Operating_system == "windows"){ 
    run_command_windows(glue("--bfile dgrp2_QC_all_lines_imputed",
                             " --geno 0.1 --maf 0.05 --allow-no-sex", 
                             " --make-bed --out dgrp2_QC_all_lines_imputed_correct"), path = "/data/Derived/plink_output/")
  }
  
  if(Operating_system == "mac"){ 
    run_command_mac(glue("{plink} --bfile dgrp2_QC_all_lines_imputed",
                         " --geno 0.1 --maf 0.05 --allow-no-sex", 
                         " --make-bed --out dgrp2_QC_all_lines_imputed_correct"), path = "/data/Derived/plink_output/")
  }
  #unlink(list.files("data/derived", pattern = "~", full.names = TRUE)) # delete the original files, which were given a ~ file name by PLINK
}

Get minor allele frequencies in the DGRP

Show the code
# Use PLINK to get the allele IDs and calculate the MAFs across the whole DGRP, for all SNPs that survived QC
  # The file created is called data/derived/plink.frq
if(!file.exists("data/Derived/plink_output/plink.frq")){
  if(Operating_system == "windows"){ 
    run_command_windows(glue("--bfile dgrp2_QC_all_lines_imputed_correct",
                             " --freq"), path = "/data/Derived/plink_output/")
  }
  if(Operating_system == "mac"){ 
    run_command_mac(glue("{plink} --bfile dgrp2_QC_all_lines_imputed_correct",
                         " --freq"), path = "/data/Derived/plink_output/")
  }
}

# Extract and save the MAFs, SNP positions, and major/minor alleles
MAFs <- 
  read.table("data/Derived/plink_output/plink.frq", header = TRUE, stringsAsFactors = FALSE) %>% 
  mutate(position = map_chr(
    strsplit(SNP, split = "_"), 
    function(x) x[2])) %>%
  dplyr::select(SNP, position, MAF, A1, A2) %>% 
  rename(minor_allele = A1,
         major_allele = A2) %>% 
  as_tibble()

Create a reduced list of LD-pruned SNPs with PLINK

1,646,615 variants passed the MAF and missingness quality control. However, proximity causes strong linkage disequilibrium, such that neighbouring SNPs tend to have similar associations with the trait under consideration in GWAS. Separate genomic regions can be identified by pruning the number of SNPs within a genomic region using the plink arguments --indep-pairwise 100 10 0.2, which prunes SNPs within 100kB sliding windows, sliding 10 variants along with each step, and allows a maximum pairwise correlation (\(r^2\)) threshold of 0.2 between loci within the window. With these parameters, 1,419,773 variants were removed, leaving 226,842.

Show the code
# indep-pairwise arguments are: 
# 100kB window size, 
# variant count to shift the window by 10 variants at the end of each step, 
# pairwise r^2 threshold of 0.2

if(!file.exists("data/Derived/plink_output/dgrp2_QC_all_lines_imputed_correct_pruned.prune.out")) {
  
  if(Operating_system == "windows"){ 
    run_command_windows(glue("--bfile dgrp2_QC_all_lines_imputed_correct",
                             " --indep-pairwise 100 10 0.2"), path = "/data/Derived/plink_output/")
  }
  
  if(Operating_system == "mac"){ 
    run_command_mac(glue("{plink} --bfile dgrp2_QC_all_lines_imputed_correct",
                         " --indep-pairwise 100 10 0.2"), path = "/data/Derived/plink_output/")
  }
}

Genomic_regions <-
  read.table("data/Derived/plink_output/dgrp2_QC_all_lines_imputed_correct_pruned.prune.in") %>% 
  rename(SNP = V1)

Build GWAS function

We use the --assoc flag to run a basic linear regression, fit with two data points: the mean phenotype for individuals homozygous for the minor allele and the mean phenotype for individuals homozygous for the major allele. The effect size (\(\beta\)) is the slope of this regression line. Negative effect sizes indicate that the minor allele is associated with a higher trait value than the major allele. The test statistic T and p-value are produced by a Wald test.

Show the code
run_GWAS <- function(phenotypes){
  
  # Make a list of the lines in our sample and save as a text file for passing to PLINK
  lines_to_keep <- phenotypes %>% dplyr::select(line) %>% mutate(line_2 = line)
  write.table(lines_to_keep, 
              row.names = FALSE, 
              col.names = FALSE, 
              file = "data/Derived/plink_output/lines_to_keep.txt", 
              quote = FALSE)

  # Now cull the PLINK files to just the lines that we measured, and re-apply the 
  # MAF cut-off of 0.05 for the new smaller sample of DGRP lines
  if(Operating_system == "windows"){ 
    run_command_windows(glue("--bfile dgrp2_QC_all_lines_imputed_correct",
                             " --keep-allele-order", # force PLINK to retain the major/minor allele designations that apply to the DGRP as a whole
                             " --keep lines_to_keep.txt --geno 0.1 --maf 0.05", 
                             " --make-bed --out dgrp2_QC_focal_lines"), path = "/data/Derived/plink_output/")
  }
  
  if(Operating_system == "mac"){ 
    run_command_mac(glue("{plink} --bfile dgrp2_QC_all_lines_imputed_correct",
                         " --keep-allele-order", # force PLINK to retain the major/minor allele designations that apply to the DGRP as a whole
                         " --keep lines_to_keep.txt --geno 0.1 --maf 0.05", 
                         " --make-bed --out dgrp2_QC_focal_lines"), path = "/data/Derived/plink_output/")
  }
  

  
    # Define a function to add our phenotype data to a .fam file, which is needed for GWAS analysis and to make sure PLINK includes these samples
  # The 'phenotypes' data frame needs to have a column called 'line'
  add_phenotypes_to_fam <- function(filepath, phenotypes){
    read_delim(filepath, col_names = FALSE, delim = " ") %>% 
      dplyr::select(X1, X2, X3, X4, X5) %>% # Get all the non-phenotype columns
      left_join(phenotypes, 
                by = c("X1" = "line")) %>%
      write.table(file = "data/Derived/plink_output/dgrp2_QC_focal_lines_NEW.fam", 
                  col.names = FALSE, row.names = FALSE, 
                  quote = FALSE, sep = " ")
    
    unlink("data/Derived/plink_output/dgrp2_QC_focal_lines.fam")
    file.rename("data/Derived/plink_output/dgrp2_QC_focal_lines_NEW.fam", 
                "data/Derived/plink_output/dgrp2_QC_focal_lines.fam")
  }
  
  # edit the new FAM file to add the phenotype data from 'phenotypes'
  add_phenotypes_to_fam("data/Derived/plink_output/dgrp2_QC_focal_lines.fam", phenotypes)
  
  # Run GWAS 
  if(Operating_system == "windows"){ 
    run_command_windows("--bfile dgrp2_QC_focal_lines  --assoc --maf 0.05 --allow-no-sex", 
                        path = "/data/Derived/plink_output")
  }
  
  if(Operating_system == "mac"){ 
    run_command_mac("{plink} --bfile dgrp2_QC_focal_lines  --assoc --maf 0.05 --allow-no-sex", 
                    path = "/data/Derived/plink_output")
  }
  
  # wrangle the GWAS results
  
  Focal_name <- deparse(substitute(phenotypes))
  
  gwas_results <- read.table("data/Derived/plink_output/plink.qassoc", 
                             header = TRUE) %>% 
    dplyr::select(SNP, BETA, SE, "T", P)

  # Rename and compress the GWAS summary stats file 
  # The filter step means that only variants in the LD-pruned subset get saved to disk.
  gwas_results %>% 
  #  filter(SNP %in% (pull(read_tsv("data/Derived/plink_output/dgrp2_QC_all_lines_imputed_correct_pruned.prune.in", col_names = "SNP"), SNP))) %>% 
    write_tsv(glue("data/Derived/GWAS_results/{Focal_name}.tsv.gz"))
  unlink("data/Derived/plink_output/plink.qassoc")
  
  # Rename the plink log file
  file.rename("data/Derived/plink_output/plink.log",
              glue("data/Derived/plink_output/{Focal_name}_log.txt"))
  
  unlink("data/Derived/plink_output/dgrp2_QC_focal_lines.bim")
  unlink("data/Derived/plink_output/dgrp2_QC_focal_lines.bed")
  unlink("data/Derived/plink_output/dgrp2_QC_focal_lines.fam")
  unlink("data/Derived/plink_output/dgrp2_QC_focal_lines.log")
} 

Build manhattan plot function

Show the code
build_manhattan_plot <- function(gwas_results){
  
  manhattan_data <- gwas_results %>%
    mutate(position = str_split(SNP, "_"), # split the SNP name into logical bits
           chr = map_chr(position, ~ .x[1]), # the first bit is the chromosome arm - name the column appropriately
           position = as.numeric(map_chr(position, ~ .x[2])), # where on the chromosome is the SNP found
           pval = -1 * log10(P)) %>% # make visualising the p values easier
    dplyr::select(chr, position, pval) %>% 
    filter(chr != "4")
  
  # this next chunk finds convenient cuts for labels and colour changes 
  
  max_pos <- manhattan_data %>%
    group_by(chr) %>%
    summarise(
      max_pos = max(position), 
      middle_pos = (max_pos - min(position)) / 2,
      .groups = "drop") %>%
    as.data.frame()
  
  max_pos$max_pos <- c(0, cumsum(max_pos$max_pos[1:4]))
  
  max_pos$label_pos <- max_pos$max_pos + max_pos$middle_pos
  
  # combine the two dataframes created above
  
  manhattan_data <- manhattan_data %>%
    left_join(max_pos, by = "chr") %>%
    mutate(position = position + max_pos,
           chromosome_colour = case_when(chr == "2L" | chr == "3L" | chr == "X" ~ "A",
                                         .default = "B"),
           Notable = if_else(pval >= -log10(1e-08), "YES", "NO"))
  
  plot <- manhattan_data %>% filter(Notable == "NO") %>% 
    ggplot(aes(position, pval, group = chr, stroke = 0.01)) +
    geom_point(aes(colour = chromosome_colour), size = 1.8, alpha = 1) +
    geom_point(data =manhattan_data %>% filter(Notable == "YES"),
               aes(fill = chromosome_colour), colour = "black", shape = 21, size = 3.5, alpha = 1) +
    geom_hline(yintercept = -log10(1e-08), linetype = 2, colour = "#33271e", linewidth = 1, alpha = 0.8) +
    #geom_hline(yintercept = -log10(1e-05), linetype = 2, colour = "#33271e", linewidth = 1, alpha = 0.8) +
    scale_colour_manual(values = c(met.brewer(name = "Hokusai3")[3], met.brewer(name = "Hokusai3")[6])) +
    scale_fill_manual(values = c(met.brewer(name = "Hokusai3")[3], met.brewer(name = "Hokusai3")[6])) +
    scale_x_continuous(breaks = max_pos$label_pos, labels = max_pos$chr) +
    ylab("-log~10~(_p_)") + 
    xlab("Chromosome and position") +
    theme_classic() +
    theme(legend.position = "none",
          axis.title.y = element_markdown(size = 14),
          axis.title.x = element_markdown(size = 14),
          axis.text.x = element_text(size = 12),
          axis.text.y = element_text(size = 12))  
}

Preparing for cross phenotype meta-analysis

The power to detect variants associated with genetically correlated phenotypes can be increased if a meta-analytic approach is adopted (Zhu et al. 2018). Here, we used the cross-phenotype association (hereafter CPASSOC) approach developed by Zhou et al. (2015), which evaluates the null hypothesis that SNPs are not associated with any of the considered traits, weighted by the sample size of each study and adjusted for the trait genetic correlation matrix. In less language, CPASSOC evaluates the aggregated evidence of an association between a SNP and multiple phenotypes. The steps to apply CPASSOC are as follows:

  1. Estimate \(R\), the trait correlation matrix. In the DGRP, this can be done using SNP data (pruned to minimise the effect of linkage disequilibrium) or using line means.

  2. GWAS each trait separately (see above).

  3. Collate effect sizes for each trait into a vector \(\mathbf{\beta}\) for each SNP.

  4. Use a Wald test to estimate a test statistic \(T_{ijk}\) for the \(i^{th}\) SNP, \(j^{th}\) cohort and \(k^{th}\) treatment condition:

\[T_{ijk} = \frac{\hat\beta_{ijk}}{\hat{s}_{ijk}}\] , where \(\hat\beta_{ijk}\) and \(\hat{s}_{ijk}\) are the estimated coefficient and standard error for the \(i^{th}\) SNP in the \(j^{th}\), for the \(k^{th}\) treatment condition. From individual test statistics, a vector holding test statistics for all traits (\(T\)) can be built.

  1. Test whether \(\mathbf{\beta} = \mathbf{0}\). If the trait data are homogeneous (SNPs are expected to affect all traits in the same direction and at the same magnitude), use:

\[S_{Hom} = \frac{e^T(RW)^{-1}T(e^T(RW)^{-1}T)^T}{e^T(WRW)^{-1}e}\] where \(W\) is a diagonal matrix of weights for the individual test statistics (either the inverse of the variance or in our case the square root of the sample size of the \(j^{th}\) cohort: \(\sqrt{n_j}\)).

  1. If there is heterogeneity between trait measures (i.e. it is a reasonable expectation that genetic variants could affect some traits in one direction and others in the opposing direction), \(S_{Hom}\) is not appropriate. The ideal test statistic in this case is agnostic to the sign of a genetic variant’s phenoypic effect and includes only the cohorts and traits with a true contribution to the association of a genetic variant. To implement this, the absolute value \(\tau\) is used as a threshold, below which traits are not included in the test statistic. To allow for for effects of different signs in different environmental contexts, let \(w_{ijk} = \sqrt{n_j}\times \mathrm{sign}(T_{ijk})\). To calculate this heterogenous summary statistic first find,

\[S_{(\tau)} = \frac{e^T(R(\tau)W(\tau))^{-1}T(\tau)(e^T(R(\tau)W(\tau))^{-1}T(\tau))^T}{e^TW(\tau)^{-1}R(\tau)^{-1}W(\tau)^{-1}e}\]

This statistic, \(S_{Het}\) can be viewed as the maximum of the weighted sum of trait-specific test statistics that satisfy the \(\tau\) threshold.

When \(\tau\) is large, \(S(\tau)\) can be undefined if the test statistic falls below \(\tau\) in all contexts and cohorts. In this case \(S(\tau) = 0\). Our test statistic is then

\[\displaystyle S_{Het} = \max_{\tau \gt 0} S(\tau)\] Note that the restriction imposed by \(\tau\) and the sign specific \(w_{ijk}\) are the only differences between \(S_{Het}\) and \(S_{Hom}\).

The inclusion of \(\tau\) might give the impression of ‘cherry picking’. However, the value of \(S_{Het}\) lies in increasing power relative to univariate GWAS not by assessing if all tests return a strong, concordant association, but by identifying that more than one phenotype is strongly associated with a SNP. The more phenotypes associated, the larger \(S_{Het}\) becomes.

To generate p-values, \(S_{Het}\) is compared to a gamma distribution with a mean shift of test-statistics (see Zhou et al. (2015)).

Code to implement both statistics in R can be downloaded here, and is called below.

The functions

These are directly loaded from here

Show the code
require(compiler)
enableJIT(4)
[1] 3
Show the code
Non_Trucated_TestScore <- function(X, SampleSize, CorrMatrix)
{
  Wi = matrix(SampleSize, nrow = 1);
  sumW = sqrt(sum(Wi^2));
  W = Wi / sumW;
  
  Sigma = ginv(CorrMatrix);
  XX = apply(X, 1, function(x) {
    x1 <- matrix(x, ncol = length(x), nrow = 1);
    T = W %*% Sigma %*% t(x1);
    T = (T*T) / (W %*% Sigma %*% t(W));
    return(T[1,1]);
  }
  );
  return(XX);
}
SHom <- cmpfun(Non_Trucated_TestScore);

Trucated_TestScore <- function(X, SampleSize, CorrMatrix, correct = 1, startCutoff = 0, endCutoff = 1, CutoffStep = 0.05, isAllpossible = T)
{
  N = dim(X)[2];
  
  Wi = matrix(SampleSize, nrow = 1);
  sumW = sqrt(sum(Wi^2));
  W = Wi / sumW;
  
  XX = apply(X, 1, function(x) {
    TTT = -1;
    
    if (isAllpossible == T ) {
      cutoff = sort(unique(abs(x)));      ## it will filter out any of them.
    } else {
      cutoff = seq(startCutoff, endCutoff, CutoffStep);
    }
    
    for (threshold in cutoff) {
      x1 = x;
      index = which(abs(x1) < threshold);
      
      if (length(index) == N) break;
      
      A = CorrMatrix;
      
      W1 = W;
      if (length(index) !=0 ) {
        x1 = x1[-index];
        A  = A[-index, -index];   ## update the matrix
        W1 = W[-index];
      }
      
      if (correct == 1)
      {
        index = which(x1 < 0);
        if (length(index) != 0) {
          W1[index] = -W1[index];    ## update the sign
        }
      }
      
      A = ginv(A);
      x1 = matrix(x1, nrow = 1);
      W1 = matrix(W1, nrow = 1);
      T = W1 %*% A %*% t(x1);
      T = (T*T) / (W1 %*% A %*% t(W1));
      
      if (TTT < T[1,1]) TTT = T[1,1];
    }
    return(TTT);
  }
  );
  return(XX);
}
SHet <- cmpfun(Trucated_TestScore);

EstimateGamma <- function (N = 1E6, SampleSize, CorrMatrix, correct = 1, startCutoff = 0, endCutoff = 1, CutoffStep = 0.05, isAllpossible = T) {
  
  Wi = matrix(SampleSize, nrow = 1);
  sumW = sqrt(sum(Wi^2));
  W = Wi / sumW;
  
  Permutation = mvrnorm(n = N, mu = c(rep(0, length(SampleSize))), Sigma = CorrMatrix, tol = 1e-8, empirical = F);
  
  Stat =  Trucated_TestScore(X = Permutation, SampleSize = SampleSize, CorrMatrix = CorrMatrix,
                             correct = correct, startCutoff = startCutoff, endCutoff = endCutoff,
                             CutoffStep = CutoffStep, isAllpossible = isAllpossible);
  a = min(Stat)*3/4
  ex3 = mean(Stat*Stat*Stat)
  V =   var(Stat);
  
  for (i in 1:100){
    E = mean(Stat)-a;
    k = E^2/V
    theta = V/E
    a = (-3*k*(k+1)*theta**2+sqrt(9*k**2*(k+1)**2*theta**4-12*k*theta*(k*(k+1)*(k+2)*theta**3-ex3)))/6/k/theta
  }
  
  para = c(k,theta,a);
  return(para);
}

EmpDist <- function (N = 1E6, SampleSize, CorrMatrix, correct = 1, startCutoff = 0, endCutoff = 1, CutoffStep = 0.05, isAllpossible = T) {
  
  Wi = matrix(SampleSize, nrow = 1);
  sumW = sqrt(sum(Wi^2));
  W = Wi / sumW;
  
  Permutation = mvrnorm(n = N, mu = c(rep(0, length(SampleSize))), Sigma = CorrMatrix, tol = 1e-8, empirical = F);
  
  Stat =  Trucated_TestScore(X = Permutation, SampleSize = SampleSize, CorrMatrix = CorrMatrix, correct = correct, startCutoff = startCutoff, endCutoff = endCutoff, CutoffStep = CutoffStep, isAllpossible = isAllpossible);
  
  return(Stat);
}

Analysing life expectancy and lifespan equality

Run univariate GWAS

Run GWAS for each environmental context and save the results.

Show the code
# prepare phenotype data for GWAS

Arya_f_l <- prep_for_e0_GWAS(Arya_2010_f)
Arya_m_l <- prep_for_e0_GWAS(Arya_2010_m)
Arya_f_h <- prep_for_h_GWAS(Arya_2010_f)
Arya_m_h <- prep_for_h_GWAS(Arya_2010_m)
Huang_f_18_l <- prep_for_e0_GWAS(Huang_2020_f_18)
Huang_f_18_h <- prep_for_h_GWAS(Huang_2020_f_18)
Huang_m_18_l <- prep_for_e0_GWAS(Huang_2020_m_18)
Huang_m_18_h <- prep_for_h_GWAS(Huang_2020_m_18)
Huang_f_25_l <- prep_for_e0_GWAS(Huang_2020_f_25)
Huang_f_25_h <- prep_for_h_GWAS(Huang_2020_f_25)
Huang_m_25_l <- prep_for_e0_GWAS(Huang_2020_m_25)
Huang_m_25_h <- prep_for_h_GWAS(Huang_2020_m_25)
Huang_f_28_l <- prep_for_e0_GWAS(Huang_2020_f_28)
Huang_f_28_h <- prep_for_h_GWAS(Huang_2020_f_28)
Huang_m_28_l <- prep_for_e0_GWAS(Huang_2020_m_28)
Huang_m_28_h <- prep_for_h_GWAS(Huang_2020_m_28)
Wilson_f_l_1 <- prep_for_e0_GWAS(Wilson_2020_f_1)
Wilson_f_h_1 <- prep_for_h_GWAS(Wilson_2020_f_1)
Wilson_f_l_2 <- prep_for_e0_GWAS(Wilson_2020_f_2)
Wilson_f_h_2 <- prep_for_h_GWAS(Wilson_2020_f_2)
Durham_f_l <- prep_for_e0_GWAS(Durham_2014_f)
Durham_f_h <- prep_for_h_GWAS(Durham_2014_f)
Patel_f_l <- prep_for_e0_GWAS(Patel_2021_f)
Patel_f_h <- prep_for_h_GWAS(Patel_2021_f)

# if not already done, run the GWA tests

if(!file.exists("data/Derived/GWAS_results/Arya_f_l.tsv.gz")) {
run_GWAS(Arya_f_l)
run_GWAS(Arya_m_l)
run_GWAS(Arya_f_h)
run_GWAS(Arya_m_h)
run_GWAS(Huang_f_18_l)
run_GWAS(Huang_f_18_h)
run_GWAS(Huang_m_18_l)
run_GWAS(Huang_m_18_h)
run_GWAS(Huang_f_25_l)
run_GWAS(Huang_f_25_h)
run_GWAS(Huang_m_25_l)
run_GWAS(Huang_m_25_h)
run_GWAS(Huang_f_28_l)
run_GWAS(Huang_f_28_h)
run_GWAS(Huang_m_28_l)
run_GWAS(Huang_m_28_h)
run_GWAS(Wilson_f_l_1)
run_GWAS(Wilson_f_h_1)
run_GWAS(Wilson_f_l_2)
run_GWAS(Wilson_f_h_2)
run_GWAS(Durham_f_l)
run_GWAS(Durham_f_h)
run_GWAS(Patel_f_l)
run_GWAS(Patel_f_h)
}

Load the results

Show the code
# load GWAS results

# Life expectancy

Arya_f_l_GWAS <- read_tsv("data/Derived/GWAS_results/Arya_f_l.tsv.gz") 
  
Huang_f_18_l_GWAS <- read_tsv("data/Derived/GWAS_results/Huang_f_18_l.tsv.gz")

Huang_f_25_l_GWAS <- read_tsv("data/Derived/GWAS_results/Huang_f_25_l.tsv.gz") 

Huang_f_28_l_GWAS <- read_tsv("data/Derived/GWAS_results/Huang_f_28_l.tsv.gz")

Wilson_f_l_1_GWAS <- read_tsv("data/Derived/GWAS_results/Wilson_f_l_1.tsv.gz") 

Wilson_f_l_2_GWAS <- read_tsv("data/Derived/GWAS_results/Wilson_f_l_2.tsv.gz") 

Durham_f_l_GWAS <- read_tsv("data/Derived/GWAS_results/Durham_f_l.tsv.gz")

Patel_f_l_GWAS <- read_tsv("data/Derived/GWAS_results/Patel_f_l.tsv.gz")

Arya_m_l_GWAS <- read_tsv("data/Derived/GWAS_results/Arya_m_l.tsv.gz")

Huang_m_18_l_GWAS <- read_tsv("data/Derived/GWAS_results/Huang_m_18_l.tsv.gz")

Huang_m_25_l_GWAS <- read_tsv("data/Derived/GWAS_results/Huang_m_25_l.tsv.gz")

Huang_m_28_l_GWAS <- read_tsv("data/Derived/GWAS_results/Huang_m_28_l.tsv.gz")

# Lifespan equality

Arya_f_h_GWAS <- read_tsv("data/Derived/GWAS_results/Arya_f_h.tsv.gz")
  
Huang_f_18_h_GWAS <- read_tsv("data/Derived/GWAS_results/Huang_f_18_h.tsv.gz")

Huang_f_25_h_GWAS <- read_tsv("data/Derived/GWAS_results/Huang_f_25_h.tsv.gz") 

Huang_f_28_h_GWAS <- read_tsv("data/Derived/GWAS_results/Huang_f_28_h.tsv.gz")

Wilson_f_h_1_GWAS <- read_tsv("data/Derived/GWAS_results/Wilson_f_h_1.tsv.gz")

Wilson_f_h_2_GWAS <- read_tsv("data/Derived/GWAS_results/Wilson_f_h_2.tsv.gz")

Durham_f_h_GWAS <- read_tsv("data/Derived/GWAS_results/Durham_f_h.tsv.gz")

Patel_f_h_GWAS <- read_tsv("data/Derived/GWAS_results/Patel_f_h.tsv.gz")

Arya_m_h_GWAS <- read_tsv("data/Derived/GWAS_results/Arya_m_h.tsv.gz")

Huang_m_18_h_GWAS <- read_tsv("data/Derived/GWAS_results/Huang_m_18_h.tsv.gz")

Huang_m_25_h_GWAS <- read_tsv("data/Derived/GWAS_results/Huang_m_25_h.tsv.gz")

Huang_m_28_h_GWAS <- read_tsv("data/Derived/GWAS_results/Huang_m_28_h.tsv.gz")

As a point of comparison, we find the sum of significant associations detected by univariate GWAS

Table SX. Genotype to phenotype associations detected by univariate GWAS, for life expectancy. The total row shows the number of unique candidate variants identified across all studies. *Wilson et al phenotyped lifespan under two separate dietary conditions, which we include separately in our analysis. The number of genomic regions indicates the number of assocations found after LD pruning.

Show the code
# filter down to sig associations
e0_table <-
  bind_rows(
    tibble(`p < 1e-05 variants` = nrow(Arya_f_l_GWAS %>% filter(P < 1e-05)),
           `p < 1e-05 genomic regions` = nrow(inner_join(Genomic_regions, Arya_f_l_GWAS %>% filter(P < 1e-05))),
           `p < 1e-08 variants` = nrow(filter(Arya_f_l_GWAS, P < 1e-08)),
           `p < 1e-08 genomic regions` = nrow(inner_join(Genomic_regions, Arya_f_l_GWAS %>% filter(P < 1e-08)))) %>%
      mutate(Study = "Arya et al 2010",
             Treatment = "1",
             Sex = "Female",
             Temperature = "25",
             `Mating status` = "Virgin") %>% 
      dplyr::select(Study, Sex, Temperature,
                    `p < 1e-05 variants`, `p < 1e-05 genomic regions`, 
                    `p < 1e-08 variants`, `p < 1e-08 genomic regions`),
    
    
    tibble(`p < 1e-05 variants` = nrow(Huang_f_18_l_GWAS %>% filter(P < 1e-05)),
           `p < 1e-05 genomic regions` = nrow(inner_join(Genomic_regions, Huang_f_18_l_GWAS %>% filter(P < 1e-05))),
           `p < 1e-08 variants` = nrow(filter(Huang_f_18_l_GWAS, P < 1e-08)),
           `p < 1e-08 genomic regions` = nrow(inner_join(Genomic_regions, Huang_f_18_l_GWAS %>% filter(P < 1e-08)))) %>% 
      mutate(Study = "Huang et al 2020",
             Treatment = "1",
             Sex = "Female",
             Temperature = "18",
             `Mating status` = "Mated") %>% 
      dplyr::select(Study, Sex, Temperature,
                    `p < 1e-05 variants`, `p < 1e-05 genomic regions`, 
                    `p < 1e-08 variants`, `p < 1e-08 genomic regions`),
    
    
    tibble(`p < 1e-05 variants` = nrow(Huang_f_25_l_GWAS %>% filter(P < 1e-05)),
           `p < 1e-05 genomic regions` = nrow(inner_join(Genomic_regions, Huang_f_25_l_GWAS %>% filter(P < 1e-05))),
           `p < 1e-08 variants` = nrow(filter(Huang_f_25_l_GWAS, P < 1e-08)),
           `p < 1e-08 genomic regions` = nrow(inner_join(Genomic_regions, Huang_f_25_l_GWAS %>% filter(P < 1e-08)))) %>%
      mutate(Study = "Huang et al 2020",
             Treatment = "1",
             Sex = "Female",
             Temperature = "25",
             `Mating status` = "Mated") %>% 
      dplyr::select(Study, Sex, Temperature,
                    `p < 1e-05 variants`, `p < 1e-05 genomic regions`, 
                    `p < 1e-08 variants`, `p < 1e-08 genomic regions`),
    
    tibble(`p < 1e-05 variants` = nrow(Huang_f_28_l_GWAS %>% filter(P < 1e-05)),
           `p < 1e-05 genomic regions` = nrow(inner_join(Genomic_regions, Huang_f_28_l_GWAS %>% filter(P < 1e-05))),
           `p < 1e-08 variants` = nrow(filter(Huang_f_28_l_GWAS, P < 1e-08)),
           `p < 1e-08 genomic regions` = nrow(inner_join(Genomic_regions, Huang_f_28_l_GWAS %>% filter(P < 1e-08)))) %>%
      mutate(Study = "Huang et al 2020",
             Treatment = "1",
             Sex = "Female",
             Temperature = "28",
             `Mating status` = "Mated") %>% 
      dplyr::select(Study, Sex, Temperature,
                    `p < 1e-05 variants`, `p < 1e-05 genomic regions`, 
                    `p < 1e-08 variants`, `p < 1e-08 genomic regions`),
    
    tibble(`p < 1e-05 variants` = nrow(Wilson_f_l_1_GWAS %>% filter(P < 1e-05)),
           `p < 1e-05 genomic regions` = nrow(inner_join(Genomic_regions, Wilson_f_l_1_GWAS %>% filter(P < 1e-05))),
           `p < 1e-08 variants` = nrow(filter(Wilson_f_l_1_GWAS, P < 1e-08)),
           `p < 1e-08 genomic regions` = nrow(inner_join(Genomic_regions, Wilson_f_l_1_GWAS %>% filter(P < 1e-08)))) %>%
      mutate(Study = "Wilson et al 2020",
             Treatment = "1",
             Sex = "Female",
             Temperature = "25",
             `Mating status` = "Mated") %>% 
      dplyr::select(Study, Sex, Temperature,
                    `p < 1e-05 variants`, `p < 1e-05 genomic regions`, 
                    `p < 1e-08 variants`, `p < 1e-08 genomic regions`),
    
    tibble(`p < 1e-05 variants` = nrow(Wilson_f_l_2_GWAS %>% filter(P < 1e-05)),
           `p < 1e-05 genomic regions` = nrow(inner_join(Genomic_regions, Wilson_f_l_2_GWAS %>% filter(P < 1e-05))),
           `p < 1e-08 variants` = nrow(filter(Wilson_f_l_2_GWAS, P < 1e-08)),
           `p < 1e-08 genomic regions` = nrow(inner_join(Genomic_regions, Wilson_f_l_2_GWAS %>% filter(P < 1e-08)))) %>% 
      mutate(Study = "Wilson et al 2020*",
             Treatment = "2",
             Sex = "Female",
             Temperature = "25",
             `Mating status` = "Mated") %>% 
      dplyr::select(Study, Sex, Temperature,
                    `p < 1e-05 variants`, `p < 1e-05 genomic regions`, 
                    `p < 1e-08 variants`, `p < 1e-08 genomic regions`),
    
    tibble(`p < 1e-05 variants` = nrow(Durham_f_l_GWAS %>% filter(P < 1e-05)),
           `p < 1e-05 genomic regions` = nrow(inner_join(Genomic_regions, Durham_f_l_GWAS %>% filter(P < 1e-05))),
           `p < 1e-08 variants` = nrow(filter(Durham_f_l_GWAS, P < 1e-08)),
           `p < 1e-08 genomic regions` = nrow(inner_join(Genomic_regions, Durham_f_l_GWAS %>% filter(P < 1e-08)))) %>% 
      mutate(Study = "Durham et al 2014",
             Treatment = "1",
             Sex = "Female",
             Temperature = "25",
             `Mating status` = "Mated") %>% 
      dplyr::select(Study, Sex, Temperature,
                    `p < 1e-05 variants`, `p < 1e-05 genomic regions`, 
                    `p < 1e-08 variants`, `p < 1e-08 genomic regions`),
    
    
    tibble(`p < 1e-05 variants` = nrow(Patel_f_l_GWAS %>% filter(P < 1e-05)),
           `p < 1e-05 genomic regions` = nrow(inner_join(Genomic_regions, Patel_f_l_GWAS %>% filter(P < 1e-05))),
           `p < 1e-08 variants` = nrow(filter(Patel_f_l_GWAS, P < 1e-08)),
           `p < 1e-08 genomic regions` = nrow(inner_join(Genomic_regions, Patel_f_l_GWAS %>% filter(P < 1e-08)))) %>%
      mutate(Study = "Patel et al 2021",
             Treatment = "1",
             Sex = "Female",
             Temperature = "23",
             `Mating status` = "Mated") %>% 
      dplyr::select(Study, Sex, Temperature,
                    `p < 1e-05 variants`, `p < 1e-05 genomic regions`, 
                    `p < 1e-08 variants`, `p < 1e-08 genomic regions`),
    
    
    tibble(`p < 1e-05 variants` = nrow(Arya_m_l_GWAS %>% filter(P < 1e-05)),
           `p < 1e-05 genomic regions` = nrow(inner_join(Genomic_regions, Arya_m_l_GWAS %>% filter(P < 1e-05))),
           `p < 1e-08 variants` = nrow(filter(Arya_m_l_GWAS, P < 1e-08)),
           `p < 1e-08 genomic regions` = nrow(inner_join(Genomic_regions, Arya_m_l_GWAS %>% filter(P < 1e-08)))) %>%
      mutate(Study = "Arya et al 2010",
             Treatment = "1",
             Sex = "Male",
             Temperature = "25",
             `Mating status` = "Virgin") %>% 
      dplyr::select(Study, Sex, Temperature,
                    `p < 1e-05 variants`, `p < 1e-05 genomic regions`, 
                    `p < 1e-08 variants`, `p < 1e-08 genomic regions`),
    
    tibble(`p < 1e-05 variants` = nrow(Huang_m_18_l_GWAS %>% filter(P < 1e-05)),
           `p < 1e-05 genomic regions` = nrow(inner_join(Genomic_regions, Huang_m_18_l_GWAS %>% filter(P < 1e-05))),
           `p < 1e-08 variants` = nrow(filter(Huang_m_18_l_GWAS, P < 1e-08)),
           `p < 1e-08 genomic regions` = nrow(inner_join(Genomic_regions, Huang_m_18_l_GWAS %>% filter(P < 1e-08)))) %>%
      mutate(Study = "Huang et al 2020",
             Treatment = "1",
             Sex = "Male",
             Temperature = "18",
             `Mating status` = "Mated") %>% 
      dplyr::select(Study, Sex, Temperature,
                    `p < 1e-05 variants`, `p < 1e-05 genomic regions`, 
                    `p < 1e-08 variants`, `p < 1e-08 genomic regions`),
    
    
    tibble(`p < 1e-05 variants` = nrow(Huang_m_25_l_GWAS %>% filter(P < 1e-05)),
           `p < 1e-05 genomic regions` = nrow(inner_join(Genomic_regions, Huang_m_25_l_GWAS %>% filter(P < 1e-05))),
           `p < 1e-08 variants` = nrow(filter(Huang_m_25_l_GWAS, P < 1e-08)),
           `p < 1e-08 genomic regions` = nrow(inner_join(Genomic_regions, Huang_m_25_l_GWAS %>% filter(P < 1e-08)))) %>%
      mutate(Study = "Huang et al 2020",
             Treatment = "1",
             Sex = "Male",
             Temperature = "25",
             `Mating status` = "Mated") %>% 
      dplyr::select(Study, Sex, Temperature,
                    `p < 1e-05 variants`, `p < 1e-05 genomic regions`, 
                    `p < 1e-08 variants`, `p < 1e-08 genomic regions`),
    
    tibble(`p < 1e-05 variants` = nrow(Huang_m_28_l_GWAS %>% filter(P < 1e-05)),
           `p < 1e-05 genomic regions` = nrow(inner_join(Genomic_regions, Huang_m_28_l_GWAS %>% filter(P < 1e-05))),
           `p < 1e-08 variants` = nrow(filter(Huang_m_28_l_GWAS, P < 1e-08)),
           `p < 1e-08 genomic regions` = nrow(inner_join(Genomic_regions, Huang_m_28_l_GWAS %>% filter(P < 1e-08)))) %>%
      mutate(Study = "Huang et al 2020",
             Treatment = "1",
             Sex = "Male",
             Temperature = "28",
             `Mating status` = "Mated") %>% 
      dplyr::select(Study, Sex, Temperature,
                    `p < 1e-05 variants`, `p < 1e-05 genomic regions`, 
                    `p < 1e-08 variants`, `p < 1e-08 genomic regions`),
  ) 

# how many unique variants have been detected?
p_05_SNPs_l <-
  bind_rows(
    
    Arya_f_l_GWAS %>% 
      filter(P < 1e-05),
    
    Arya_m_l_GWAS %>% 
      filter(P < 1e-05),
    
    Huang_f_18_l_GWAS %>% 
      filter(P < 1e-05),
    
    Huang_f_25_l_GWAS %>% 
      filter(P < 1e-05),
    
    Huang_f_28_l_GWAS %>% 
      filter(P < 1e-05),
    
    Huang_m_18_l_GWAS %>% 
      filter(P < 1e-05),
    
    Huang_m_25_l_GWAS %>% 
      filter(P < 1e-05),
    
    Huang_m_28_l_GWAS %>% 
      filter(P < 1e-05),
    
    Wilson_f_l_1_GWAS %>% 
      filter(P < 1e-05),
    
    Wilson_f_l_2_GWAS %>% 
      filter(P < 1e-05),
    
    Durham_f_l_GWAS %>% 
      filter(P < 1e-05),
    
    Patel_f_l_GWAS %>% 
      filter(P < 1e-05)
  ) %>% 
  distinct(SNP) %>% 
  left_join(Genomic_regions %>% mutate(Pruned_variant = "YES")) 

e0_table %>% 
  add_row(Study = "Totals",
          Sex = "",
          Temperature = "",
          `p < 1e-05 variants` = nrow(p_05_SNPs_l),
          `p < 1e-05 genomic regions` = nrow(p_05_SNPs_l %>% filter(Pruned_variant == "YES")),
          `p < 1e-08 variants` = sum(e0_table$`p < 1e-08 variants`),
          `p < 1e-08 genomic regions` = sum(e0_table$`p < 1e-08 genomic regions`)) %>% 
  kable() %>% 
  kable_styling()
Study Sex Temperature p < 1e-05 variants p < 1e-05 genomic regions p < 1e-08 variants p < 1e-08 genomic regions
Arya et al 2010 Female 25 29 5 0 0
Huang et al 2020 Female 18 14 4 0 0
Huang et al 2020 Female 25 43 7 0 0
Huang et al 2020 Female 28 34 0 0 0
Wilson et al 2020 Female 25 22 4 0 0
Wilson et al 2020* Female 25 10 1 0 0
Durham et al 2014 Female 25 51 5 0 0
Patel et al 2021 Female 23 96 4 0 0
Arya et al 2010 Male 25 12 5 0 0
Huang et al 2020 Male 18 26 2 0 0
Huang et al 2020 Male 25 39 2 0 0
Huang et al 2020 Male 28 22 2 0 0
Totals 389 39 0 0

Table SX. Genotype to phenotype associations detected by univariate GWAS, for lifespan equality. The total row shows the number of unique candidate variants identified across all studies. *Wilson et al phenotyped lifespan under two separate dietary conditions, which we include separately in our analysis. The number of genomic regions indicates the number of assocations found after LD pruning.

Show the code
# filter down to sig associations
h_table <-
  bind_rows(
    tibble(`p < 1e-05 variants` = nrow(Arya_f_h_GWAS %>% filter(P < 1e-05)),
           `p < 1e-05 genomic regions` = nrow(inner_join(Genomic_regions, Arya_f_h_GWAS %>% filter(P < 1e-05))),
           `p < 1e-08 variants` = nrow(filter(Arya_f_h_GWAS, P < 1e-08)),
           `p < 1e-08 genomic regions` = nrow(inner_join(Genomic_regions, Arya_f_h_GWAS %>% filter(P < 1e-08)))) %>%
      mutate(Study = "Arya et al 2010",
             Treatment = "1",
             Sex = "Female",
             Temperature = "25",
             `Mating status` = "Virgin") %>% 
      dplyr::select(Study, Sex, Temperature,
                    `p < 1e-05 variants`, `p < 1e-05 genomic regions`, 
                    `p < 1e-08 variants`, `p < 1e-08 genomic regions`),
    
    
    tibble(`p < 1e-05 variants` = nrow(Huang_f_18_h_GWAS %>% filter(P < 1e-05)),
           `p < 1e-05 genomic regions` = nrow(inner_join(Genomic_regions, Huang_f_18_h_GWAS %>% filter(P < 1e-05))),
           `p < 1e-08 variants` = nrow(filter(Huang_f_18_h_GWAS, P < 1e-08)),
           `p < 1e-08 genomic regions` = nrow(inner_join(Genomic_regions, Huang_f_18_h_GWAS %>% filter(P < 1e-08)))) %>% 
      mutate(Study = "Huang et al 2020",
             Treatment = "1",
             Sex = "Female",
             Temperature = "18",
             `Mating status` = "Mated") %>% 
      dplyr::select(Study, Sex, Temperature,
                    `p < 1e-05 variants`, `p < 1e-05 genomic regions`, 
                    `p < 1e-08 variants`, `p < 1e-08 genomic regions`),
    
    
    tibble(`p < 1e-05 variants` = nrow(Huang_f_25_h_GWAS %>% filter(P < 1e-05)),
           `p < 1e-05 genomic regions` = nrow(inner_join(Genomic_regions, Huang_f_25_h_GWAS %>% filter(P < 1e-05))),
           `p < 1e-08 variants` = nrow(filter(Huang_f_25_h_GWAS, P < 1e-08)),
           `p < 1e-08 genomic regions` = nrow(inner_join(Genomic_regions, Huang_f_25_h_GWAS %>% filter(P < 1e-08)))) %>%
      mutate(Study = "Huang et al 2020",
             Treatment = "1",
             Sex = "Female",
             Temperature = "25",
             `Mating status` = "Mated") %>% 
      dplyr::select(Study, Sex, Temperature,
                    `p < 1e-05 variants`, `p < 1e-05 genomic regions`, 
                    `p < 1e-08 variants`, `p < 1e-08 genomic regions`),
    
    tibble(`p < 1e-05 variants` = nrow(Huang_f_28_h_GWAS %>% filter(P < 1e-05)),
           `p < 1e-05 genomic regions` = nrow(inner_join(Genomic_regions, Huang_f_28_h_GWAS %>% filter(P < 1e-05))),
           `p < 1e-08 variants` = nrow(filter(Huang_f_28_h_GWAS, P < 1e-08)),
           `p < 1e-08 genomic regions` = nrow(inner_join(Genomic_regions, Huang_f_28_h_GWAS %>% filter(P < 1e-08)))) %>%
      mutate(Study = "Huang et al 2020",
             Treatment = "1",
             Sex = "Female",
             Temperature = "28",
             `Mating status` = "Mated") %>% 
      dplyr::select(Study, Sex, Temperature,
                    `p < 1e-05 variants`, `p < 1e-05 genomic regions`, 
                    `p < 1e-08 variants`, `p < 1e-08 genomic regions`),
    
    tibble(`p < 1e-05 variants` = nrow(Wilson_f_h_1_GWAS %>% filter(P < 1e-05)),
           `p < 1e-05 genomic regions` = nrow(inner_join(Genomic_regions, Wilson_f_h_1_GWAS %>% filter(P < 1e-05))),
           `p < 1e-08 variants` = nrow(filter(Wilson_f_h_1_GWAS, P < 1e-08)),
           `p < 1e-08 genomic regions` = nrow(inner_join(Genomic_regions, Wilson_f_h_1_GWAS %>% filter(P < 1e-08)))) %>%
      mutate(Study = "Wilson et al 2020",
             Treatment = "1",
             Sex = "Female",
             Temperature = "25",
             `Mating status` = "Mated") %>% 
      dplyr::select(Study, Sex, Temperature,
                    `p < 1e-05 variants`, `p < 1e-05 genomic regions`, 
                    `p < 1e-08 variants`, `p < 1e-08 genomic regions`),
    
    tibble(`p < 1e-05 variants` = nrow(Wilson_f_h_2_GWAS %>% filter(P < 1e-05)),
           `p < 1e-05 genomic regions` = nrow(inner_join(Genomic_regions, Wilson_f_h_2_GWAS %>% filter(P < 1e-05))),
           `p < 1e-08 variants` = nrow(filter(Wilson_f_h_2_GWAS, P < 1e-08)),
           `p < 1e-08 genomic regions` = nrow(inner_join(Genomic_regions, Wilson_f_h_2_GWAS %>% filter(P < 1e-08)))) %>% 
      mutate(Study = "Wilson et al 2020*",
             Treatment = "2",
             Sex = "Female",
             Temperature = "25",
             `Mating status` = "Mated") %>% 
      dplyr::select(Study, Sex, Temperature,
                    `p < 1e-05 variants`, `p < 1e-05 genomic regions`, 
                    `p < 1e-08 variants`, `p < 1e-08 genomic regions`),
    
    tibble(`p < 1e-05 variants` = nrow(Durham_f_h_GWAS %>% filter(P < 1e-05)),
           `p < 1e-05 genomic regions` = nrow(inner_join(Genomic_regions, Durham_f_h_GWAS %>% filter(P < 1e-05))),
           `p < 1e-08 variants` = nrow(filter(Durham_f_h_GWAS, P < 1e-08)),
           `p < 1e-08 genomic regions` = nrow(inner_join(Genomic_regions, Durham_f_h_GWAS %>% filter(P < 1e-08)))) %>% 
      mutate(Study = "Durham et al 2014",
             Treatment = "1",
             Sex = "Female",
             Temperature = "25",
             `Mating status` = "Mated") %>% 
      dplyr::select(Study, Sex, Temperature,
                    `p < 1e-05 variants`, `p < 1e-05 genomic regions`, 
                    `p < 1e-08 variants`, `p < 1e-08 genomic regions`),
    
    
    tibble(`p < 1e-05 variants` = nrow(Patel_f_h_GWAS %>% filter(P < 1e-05)),
           `p < 1e-05 genomic regions` = nrow(inner_join(Genomic_regions, Patel_f_h_GWAS %>% filter(P < 1e-05))),
           `p < 1e-08 variants` = nrow(filter(Patel_f_h_GWAS, P < 1e-08)),
           `p < 1e-08 genomic regions` = nrow(inner_join(Genomic_regions, Patel_f_h_GWAS %>% filter(P < 1e-08)))) %>%
      mutate(Study = "Patel et al 2021",
             Treatment = "1",
             Sex = "Female",
             Temperature = "23",
             `Mating status` = "Mated") %>% 
      dplyr::select(Study, Sex, Temperature,
                    `p < 1e-05 variants`, `p < 1e-05 genomic regions`, 
                    `p < 1e-08 variants`, `p < 1e-08 genomic regions`),
    
    
    tibble(`p < 1e-05 variants` = nrow(Arya_m_h_GWAS %>% filter(P < 1e-05)),
           `p < 1e-05 genomic regions` = nrow(inner_join(Genomic_regions, Arya_m_h_GWAS %>% filter(P < 1e-05))),
           `p < 1e-08 variants` = nrow(filter(Arya_m_h_GWAS, P < 1e-08)),
           `p < 1e-08 genomic regions` = nrow(inner_join(Genomic_regions, Arya_m_h_GWAS %>% filter(P < 1e-08)))) %>%
      mutate(Study = "Arya et al 2010",
             Treatment = "1",
             Sex = "Male",
             Temperature = "25",
             `Mating status` = "Virgin") %>% 
      dplyr::select(Study, Sex, Temperature,
                    `p < 1e-05 variants`, `p < 1e-05 genomic regions`, 
                    `p < 1e-08 variants`, `p < 1e-08 genomic regions`),
    
    tibble(`p < 1e-05 variants` = nrow(Huang_m_18_h_GWAS %>% filter(P < 1e-05)),
           `p < 1e-05 genomic regions` = nrow(inner_join(Genomic_regions, Huang_m_18_h_GWAS %>% filter(P < 1e-05))),
           `p < 1e-08 variants` = nrow(filter(Huang_m_18_h_GWAS, P < 1e-08)),
           `p < 1e-08 genomic regions` = nrow(inner_join(Genomic_regions, Huang_m_18_h_GWAS %>% filter(P < 1e-08)))) %>%
      mutate(Study = "Huang et al 2020",
             Treatment = "1",
             Sex = "Male",
             Temperature = "18",
             `Mating status` = "Mated") %>% 
      dplyr::select(Study, Sex, Temperature,
                    `p < 1e-05 variants`, `p < 1e-05 genomic regions`, 
                    `p < 1e-08 variants`, `p < 1e-08 genomic regions`),
    
    
    tibble(`p < 1e-05 variants` = nrow(Huang_m_25_h_GWAS %>% filter(P < 1e-05)),
           `p < 1e-05 genomic regions` = nrow(inner_join(Genomic_regions, Huang_m_25_h_GWAS %>% filter(P < 1e-05))),
           `p < 1e-08 variants` = nrow(filter(Huang_m_25_h_GWAS, P < 1e-08)),
           `p < 1e-08 genomic regions` = nrow(inner_join(Genomic_regions, Huang_m_25_h_GWAS %>% filter(P < 1e-08)))) %>%
      mutate(Study = "Huang et al 2020",
             Treatment = "1",
             Sex = "Male",
             Temperature = "25",
             `Mating status` = "Mated") %>% 
      dplyr::select(Study, Sex, Temperature,
                    `p < 1e-05 variants`, `p < 1e-05 genomic regions`, 
                    `p < 1e-08 variants`, `p < 1e-08 genomic regions`),
    
    tibble(`p < 1e-05 variants` = nrow(Huang_m_28_h_GWAS %>% filter(P < 1e-05)),
           `p < 1e-05 genomic regions` = nrow(inner_join(Genomic_regions, Huang_m_28_h_GWAS %>% filter(P < 1e-05))),
           `p < 1e-08 variants` = nrow(filter(Huang_m_28_h_GWAS, P < 1e-08)),
           `p < 1e-08 genomic regions` = nrow(inner_join(Genomic_regions, Huang_m_28_h_GWAS %>% filter(P < 1e-08)))) %>%
      mutate(Study = "Huang et al 2020",
             Treatment = "1",
             Sex = "Male",
             Temperature = "28",
             `Mating status` = "Mated") %>% 
      dplyr::select(Study, Sex, Temperature,
                    `p < 1e-05 variants`, `p < 1e-05 genomic regions`, 
                    `p < 1e-08 variants`, `p < 1e-08 genomic regions`),
  ) 

# how many unique variants have been detected?
p_05_SNPs_h <-
  bind_rows(
    
    Arya_f_h_GWAS %>% 
      filter(P < 1e-05),
    
    Arya_m_h_GWAS %>% 
      filter(P < 1e-05),
    
    Huang_f_18_h_GWAS %>% 
      filter(P < 1e-05),
    
    Huang_f_25_h_GWAS %>% 
      filter(P < 1e-05),
    
    Huang_f_28_h_GWAS %>% 
      filter(P < 1e-05),
    
    Huang_m_18_h_GWAS %>% 
      filter(P < 1e-05),
    
    Huang_m_25_h_GWAS %>% 
      filter(P < 1e-05),
    
    Huang_m_28_h_GWAS %>% 
      filter(P < 1e-05),
    
    Wilson_f_h_1_GWAS %>% 
      filter(P < 1e-05),
    
    Wilson_f_h_2_GWAS %>% 
      filter(P < 1e-05),
    
    Durham_f_h_GWAS %>% 
      filter(P < 1e-05),
    
    Patel_f_h_GWAS %>% 
      filter(P < 1e-05)
  ) %>% 
  distinct(SNP) %>% 
  left_join(Genomic_regions %>% mutate(Pruned_variant = "YES")) 

h_table %>% 
  add_row(Study = "Totals",
          Sex = "",
          Temperature = "",
          `p < 1e-05 variants` = nrow(p_05_SNPs_h),
          `p < 1e-05 genomic regions` = nrow(p_05_SNPs_h %>% filter(Pruned_variant == "YES")),
          `p < 1e-08 variants` = sum(h_table$`p < 1e-08 variants`),
          `p < 1e-08 genomic regions` = sum(h_table$`p < 1e-08 genomic regions`)) %>% 
  kable() %>% 
  kable_styling()
Study Sex Temperature p < 1e-05 variants p < 1e-05 genomic regions p < 1e-08 variants p < 1e-08 genomic regions
Arya et al 2010 Female 25 12 3 0 0
Huang et al 2020 Female 18 4 1 0 0
Huang et al 2020 Female 25 15 8 0 0
Huang et al 2020 Female 28 41 5 0 0
Wilson et al 2020 Female 25 50 3 0 0
Wilson et al 2020* Female 25 49 3 0 0
Durham et al 2014 Female 25 12 4 0 0
Patel et al 2021 Female 23 17 2 0 0
Arya et al 2010 Male 25 14 3 0 0
Huang et al 2020 Male 18 14 2 0 0
Huang et al 2020 Male 25 24 1 0 0
Huang et al 2020 Male 28 38 3 0 0
Totals 270 37 0 0

Applying cross-phenotype meta-analysis

Generate the genetic correlation matrix

We calculate the genetic correlations between traits using both the line mean and SNP effect size comparisons. Following Zhu et al. (2015), we use the SNP correlations for analysis.

Show the code
# use the BETA coefficients to build the SNP correlation matrix

SNP_beta_e0 <-
  bind_rows(
    Arya_f_l_GWAS %>% 
      mutate(Study = "Arya_2010", Sex = "Female", Temperature = 25),
    Huang_f_18_l_GWAS %>% 
      mutate(Study= "Huang_2020", Sex= "Female", Temperature= 18),
    Huang_f_25_l_GWAS %>% 
      mutate(Study= "Huang_2020", Sex= "Female", Temperature= 25),
    Huang_f_28_l_GWAS %>% 
      mutate(Study= "Huang_2020", Sex= "Female", Temperature= 28),
    Wilson_f_l_1_GWAS %>% 
      mutate(Study= "Wilson_2020_1", Sex= "Female", Temperature= 25),
    Wilson_f_l_2_GWAS %>% 
      mutate(Study= "Wilson_2020_2", Sex= "Female", Temperature= 25),
    Durham_f_l_GWAS %>% 
      mutate(Study= "Durham_2014", Sex= "Female", Temperature= 25),
    Patel_f_l_GWAS %>% 
      mutate(Study= "Patel_2021", Sex= "Female", Temperature= 23),
    Arya_m_l_GWAS %>% 
      mutate(Study= "Arya_2010", Sex= "Male", Temperature= 25),
    Huang_m_18_l_GWAS %>% 
      mutate(Study= "Huang_2020", Sex= "Male", Temperature= 18),
    Huang_m_25_l_GWAS %>% 
      mutate(Study= "Huang_2020", Sex= "Male", Temperature = 25),
    Huang_m_28_l_GWAS %>% 
      mutate(Study = "Huang_2020", Sex = "Male", Temperature = 28)) %>% 
  dplyr::select(SNP, BETA, Study, Sex, Temperature) %>% 
  pivot_wider(values_from = BETA, names_from = c(Study, Sex, Temperature)) %>% 
  rename(Arya_f_25 = Arya_2010_Female_25,
         Huang_f_18 = Huang_2020_Female_18,
         Huang_f_25 = Huang_2020_Female_25,
         Huang_f_28 = Huang_2020_Female_28,
         Wilson_f_25_1 = Wilson_2020_1_Female_25,
         Wilson_f_25_2 = Wilson_2020_2_Female_25,
         Durham_f_25 = Durham_2014_Female_25,
         Patel_f_23 = Patel_2021_Female_23,
         Arya_m_25 = Arya_2010_Male_25,
         Huang_m_18 = Huang_2020_Male_18,
         Huang_m_25 = Huang_2020_Male_25,
         Huang_m_28 = Huang_2020_Male_28)

SNP_beta_e0_LD_pruned <-
  SNP_beta_e0 %>% 
  inner_join(Genomic_regions)

SNP_beta_h <-
  bind_rows(
    Arya_f_h_GWAS %>% 
      mutate(Study = "Arya_2010", Sex = "Female", Temperature = 25),
    Huang_f_18_h_GWAS %>% 
      mutate(Study= "Huang_2020", Sex= "Female", Temperature= 18),
    Huang_f_25_h_GWAS %>% 
      mutate(Study= "Huang_2020", Sex= "Female", Temperature= 25),
    Huang_f_28_h_GWAS %>% 
      mutate(Study= "Huang_2020", Sex= "Female", Temperature= 28),
    Wilson_f_h_1_GWAS %>% 
      mutate(Study= "Wilson_2020_1", Sex= "Female", Temperature= 25),
    Wilson_f_h_2_GWAS %>% 
      mutate(Study= "Wilson_2020_2", Sex= "Female", Temperature= 25),
    Durham_f_h_GWAS %>% 
      mutate(Study= "Durham_2014", Sex= "Female", Temperature= 25),
    Patel_f_h_GWAS %>% 
      mutate(Study= "Patel_2021", Sex= "Female", Temperature= 23),
    Arya_m_h_GWAS %>% 
      mutate(Study= "Arya_2010", Sex= "Male", Temperature= 25),
    Huang_m_18_h_GWAS %>% 
      mutate(Study= "Huang_2020", Sex= "Male", Temperature= 18),
    Huang_m_25_h_GWAS %>% 
      mutate(Study= "Huang_2020", Sex= "Male", Temperature = 25),
    Huang_m_28_h_GWAS %>% 
      mutate(Study = "Huang_2020", Sex = "Male", Temperature = 28)) %>% 
  dplyr::select(SNP, BETA, Study, Sex, Temperature) %>% 
  pivot_wider(values_from = BETA, names_from = c(Study, Sex, Temperature)) %>% 
  rename(Arya_f_25 = Arya_2010_Female_25,
         Huang_f_18 = Huang_2020_Female_18,
         Huang_f_25 = Huang_2020_Female_25,
         Huang_f_28 = Huang_2020_Female_28,
         Wilson_f_25_1 = Wilson_2020_1_Female_25,
         Wilson_f_25_2 = Wilson_2020_2_Female_25,
         Durham_f_25 = Durham_2014_Female_25,
         Patel_f_23 = Patel_2021_Female_23,
         Arya_m_25 = Arya_2010_Male_25,
         Huang_m_18 = Huang_2020_Male_18,
         Huang_m_25 = Huang_2020_Male_25,
         Huang_m_28 = Huang_2020_Male_28)

SNP_beta_h_LD_pruned <-
  SNP_beta_h %>% 
  inner_join(Genomic_regions)
  

SNP_e0_corr_matrix <- cor(SNP_beta_e0_LD_pruned %>% dplyr::select(-SNP), use = "pairwise.complete.obs", method = "spearman")
SNP_h_corr_matrix <- cor(SNP_beta_h_LD_pruned %>% dplyr::select(-SNP), use = "pairwise.complete.obs", method = "spearman")


line_data <-
  bind_rows(Arya_2010_f,
            Huang_2020_f_18,
            Huang_2020_f_25,
            Huang_2020_f_28,
            Wilson_2020_f_1,
            Wilson_2020_f_2,
            Durham_2014_f,
            Patel_2021_f,
            Arya_2010_m,
            Huang_2020_m_18,
            Huang_2020_m_25,
            Huang_2020_m_28) %>% 
  dplyr::select(line, Treatment, Sex, Temperature, e0, h) %>% 
  pivot_wider(values_from = c(e0, h), names_from = c(Treatment, Sex, Temperature)) 

line_data_e0 <-
  line_data %>% 
  dplyr::select(contains("e0")) %>% 
  rename(Arya_f_25 = e0_Arya_2010_1_Female_25,
         Huang_f_18 = e0_Huang_2020_1_Female_18,
         Huang_f_25 =  e0_Huang_2020_2_Female_25,
         Huang_f_28 = e0_Huang_2020_3_Female_28,
         Wilson_f_25_1 = e0_Wilson_2020_1_Female_25,
         Wilson_f_25_2 = e0_Wilson_2020_2_Female_25,
         Durham_f_25 = e0_Durham_2014_1_Female_25,
         Patel_f_23 = e0_Patel_2021_1_Female_23,
         Arya_m_25 = e0_Arya_2010_1_Male_25,
         Huang_m_18 = e0_Huang_2020_1_Male_18,
         Huang_m_25 = e0_Huang_2020_2_Male_25,
         Huang_m_28 = e0_Huang_2020_3_Male_28)

line_data_h <-
  line_data %>% 
  dplyr::select(!contains("e0"), -line) %>% 
  rename(Arya_f_25 = h_Arya_2010_1_Female_25,
         Huang_f_18 = h_Huang_2020_1_Female_18,
         Huang_f_25 =  h_Huang_2020_2_Female_25,
         Huang_f_28 = h_Huang_2020_3_Female_28,
         Wilson_f_25_1 = h_Wilson_2020_1_Female_25,
         Wilson_f_25_2 = h_Wilson_2020_2_Female_25,
         Durham_f_25 = h_Durham_2014_1_Female_25,
         Patel_f_23 = h_Patel_2021_1_Female_23,
         Arya_m_25 = h_Arya_2010_1_Male_25,
         Huang_m_18 = h_Huang_2020_1_Male_18,
         Huang_m_25 = h_Huang_2020_2_Male_25,
         Huang_m_28 = h_Huang_2020_3_Male_28)

line_e0_corr_matrix <- cor(line_data_e0, use = "pairwise.complete.obs", method = "spearman")
line_h_corr_matrix <- cor(line_data_h, use = "pairwise.complete.obs", method = "spearman")

Let’s visualise the genetic correlation between lifespan measures. First for life expectancy:

Show the code
breaksList <- seq(-1, 1, by = 0.02)

pheatmap(SNP_e0_corr_matrix, breaks = breaksList, 
main = "", legend_labels = c("-1", "-0.5", "0", "0.5", "Genetic correlation\n"),
color = colorRampPalette(rev(met.brewer("Benedictus", direction = 1)))(length(breaksList)),
legend = TRUE, cutree_rows = 3, cutree_cols = 3, angle_col = 45, border_color = "white")

Now for lifespan equality

Show the code
pheatmap(SNP_h_corr_matrix, breaks = breaksList, 
main = "", legend_labels = c("-1", "-0.5", "0", "0.5", "Genetic correlation\n"),
color = colorRampPalette(rev(met.brewer("Benedictus", direction = 1)))(length(breaksList)),
legend = TRUE, cutree_rows = 3, cutree_cols = 3, angle_col = 45, border_color = "white")

Calculate meta-analytic test statistics

The purpose of this meta-analysis is to search for SNPs that have some effect on life expectancy or lifespan equality, expressed in many different contexts (sexes, temperatures and mating status’).

To conduct CPASSOC for a given SNP, we need a \(T\) statistic from each environmental context. A different number of lines were included in each GWAS, which caused small differences in the number of SNPs assessed in each cohort. We therefore prune the list of SNPs to those included in all univariate analyses. After pruning, 1,603,213 SNPs remain.

The Bonferroni adjusted significance threshold for this number of tests is \(p_{adj} = \frac{0.05}{1603213} = 3.12\times 10^{-8}\); here and for all future analysis, we use p \(< 10^{-8}\) as our significance threshold, as this is slightly more conservative and easier to quickly interpret.

Life expectancy

Show the code
Arya_f_l_T <- Arya_f_l_GWAS %>% 
  dplyr::select(SNP, T) %>% 
  rename(Arya_f = T)
  
Huang_f_18_l_T <- Huang_f_18_l_GWAS %>% 
  dplyr::select(SNP, T) %>% 
  rename(Huang_f_18 = T)

Huang_f_25_l_T <- Huang_f_25_l_GWAS %>% 
  dplyr::select(SNP, T) %>% 
  rename(Huang_f_25 = T)

Huang_f_28_l_T <- Huang_f_28_l_GWAS %>% 
  dplyr::select(SNP, T) %>% 
  rename(Huang_f_28 = T)

Wilson_f_l_1_T <- Wilson_f_l_1_GWAS %>% 
  dplyr::select(SNP, T) %>%  
  rename(Wilson_f_25_1 = T)

Wilson_f_l_2_T <- Wilson_f_l_2_GWAS %>% 
  dplyr::select(SNP, T) %>%  
  rename(Wilson_f_25_2 = T)

Durham_f_l_T <- Durham_f_l_GWAS %>% 
  dplyr::select(SNP, T) %>%  
  rename(Durham_f_25 = T)

Patel_f_l_T <- Patel_f_l_GWAS %>% 
  dplyr::select(SNP, T) %>%  
  rename(Patel_f_23 = T)

Arya_m_l_T <- Arya_m_l_GWAS %>% 
  dplyr::select(SNP, T) %>% 
  rename(Arya_m = T)

Huang_m_18_l_T <- Huang_m_18_l_GWAS %>% 
  dplyr::select(SNP, T) %>% 
  rename(Huang_m_18  = T)

Huang_m_25_l_T <- Huang_m_25_l_GWAS %>% 
  dplyr::select(SNP, T) %>% 
  rename(Huang_m_25 = T)

Huang_m_28_l_T <- Huang_m_28_l_GWAS %>% 
  dplyr::select(SNP, T) %>% 
  rename(Huang_m_28  = T)

all_e0_t_stats <-
  Arya_f_l_T %>%
  inner_join(Huang_f_18_l_T, by = "SNP") %>%
  inner_join(Huang_f_25_l_T, by = "SNP") %>%
  inner_join(Huang_f_28_l_T, by = "SNP") %>% 
  inner_join(Wilson_f_l_1_T, by = "SNP") %>% 
  inner_join(Wilson_f_l_2_T, by = "SNP") %>% 
  inner_join(Durham_f_l_T, by = "SNP") %>% 
  inner_join(Patel_f_l_T, by = "SNP") %>% 
  inner_join(Arya_m_l_T, by = "SNP") %>% 
  inner_join(Huang_m_18_l_T, by = "SNP") %>% 
  inner_join(Huang_m_25_l_T, by = "SNP") %>%
  inner_join(Huang_m_28_l_T, by = "SNP")

all_e0_t_stats_values <-
  all_e0_t_stats %>% 
  dplyr::select(2:13)

Sample_size_all <- c(165, 183, 186, 177, 161, 161, 176, 193, 165, 183, 186, 177) 

if(!file.exists("data/Derived/GWAS_results/all_e0_meta_results.csv")) {

# run the homogeneous effect meta-analysis

S_hom <- SHom(all_e0_t_stats_values, Sample_size_all, SNP_e0_corr_matrix)

# calculate meta-p-values and bind the two together with the SNP names

p_S_hom <- pchisq(S_hom, df = 1, ncp = 0, lower.tail = F) %>% 
  as_tibble() %>% 
  bind_cols(S_hom) %>% 
  rename(meta_p_hom = value, 
         S_hom = ...2)

# Calculate S_het, an extension of S_hom that improves power when the genetic effect sizes vary (potentially in sign) for different traits e.g. if a SNP has a sex or enviornment opposite effect on lifespan)

# estimate parameters of gamma distribution

para <- EstimateGamma(N = 1E4, Sample_size_all, SNP_e0_corr_matrix);

S_het <- SHet(all_e0_t_stats_values, Sample_size_all, SNP_e0_corr_matrix)

# obtain P-values of S_Het using the estimated gamma parameters
  
p_S_het <- pgamma(q = S_het-para[3], shape = para[1], scale = para[2], lower.tail = F) %>% 
  as_tibble() %>% 
  bind_cols(S_het) %>% 
  rename(meta_p_het = value, 
         S_het = ...2)

# bind meta statistics with the univariate effect sizes

all_e0_meta_results <- 
  all_e0_t_stats %>% 
  bind_cols(p_S_hom,
            p_S_het) 

write_csv(all_e0_meta_results, "data/Derived/GWAS_results/all_e0_meta_results.csv")

} else all_e0_meta_results <- read_csv("data/Derived/GWAS_results/all_e0_meta_results.csv")

Lifespan equality

Show the code
Arya_f_h_T <- Arya_f_h_GWAS %>% 
  dplyr::select(SNP, T) %>% 
  rename(Arya_f = T)
  
Huang_f_18_h_T <- Huang_f_18_h_GWAS %>% 
  dplyr::select(SNP, T) %>% 
  rename(Huang_f_18 = T)

Huang_f_25_h_T <- Huang_f_25_h_GWAS %>% 
  dplyr::select(SNP, T) %>% 
  rename(Huang_f_25 = T)

Huang_f_28_h_T <- Huang_f_28_h_GWAS %>% 
  dplyr::select(SNP, T) %>% 
  rename(Huang_f_28 = T)

Wilson_f_h_1_T <- Wilson_f_h_1_GWAS %>% 
  dplyr::select(SNP, T) %>%  
  rename(Wilson_f_25_1 = T)

Wilson_f_h_2_T <- Wilson_f_h_2_GWAS %>% 
  dplyr::select(SNP, T) %>%  
  rename(Wilson_f_25_2 = T)

Durham_f_h_T <- Durham_f_h_GWAS %>% 
  dplyr::select(SNP, T) %>%  
  rename(Durham_f_25 = T)

Patel_f_h_T <- Patel_f_h_GWAS %>% 
  dplyr::select(SNP, T) %>%  
  rename(Patel_f_23 = T)

Arya_m_h_T <- Arya_m_h_GWAS %>% 
  dplyr::select(SNP, T) %>% 
  rename(Arya_m = T)

Huang_m_18_h_T <- Huang_m_18_h_GWAS %>% 
  dplyr::select(SNP, T) %>% 
  rename(Huang_m_18  = T)

Huang_m_25_h_T <- Huang_m_25_h_GWAS %>% 
  dplyr::select(SNP, T) %>% 
  rename(Huang_m_25 = T)

Huang_m_28_h_T <- Huang_m_28_h_GWAS %>% 
  dplyr::select(SNP, T) %>% 
  rename(Huang_m_28  = T)


all_h_t_stats <-
  Arya_f_h_T %>%
  inner_join(Huang_f_18_h_T, by = "SNP") %>%
  inner_join(Huang_f_25_h_T, by = "SNP") %>%
  inner_join(Huang_f_28_h_T, by = "SNP") %>% 
  inner_join(Wilson_f_h_1_T, by = "SNP") %>%
  inner_join(Wilson_f_h_2_T, by = "SNP") %>% 
  inner_join(Durham_f_h_T, by = "SNP") %>% 
  inner_join(Patel_f_h_T, by = "SNP") %>% 
  inner_join(Arya_m_h_T, by = "SNP") %>% 
  inner_join(Huang_m_18_h_T, by = "SNP") %>% 
  inner_join(Huang_m_25_h_T, by = "SNP") %>%
  inner_join(Huang_m_28_h_T, by = "SNP") 
  

all_h_t_stats_values <-
  all_h_t_stats %>% 
  dplyr::select(2:13)

if(!file.exists("data/Derived/GWAS_results/all_h_meta_results.csv")) {

S_hom <- SHom(all_h_t_stats_values, Sample_size_all, SNP_h_corr_matrix)

# calculate meta-p-values and bind the two together with the SNP names

p_S_hom <- pchisq(S_hom, df = 1, ncp = 0, lower.tail = F) %>% 
  as_tibble() %>% 
  bind_cols(S_hom) %>% 
  rename(meta_p_hom = value, 
         S_hom = ...2)

# Calculate S_het, an extension of S_hom that improves power when the genetic effect sizes vary (potentially in sign) for different traits e.g. if a SNP has a sex or enviornment opposite effect on lifespan)

# estimate parameters of gamma distribution

para <- EstimateGamma(N = 1E4, Sample_size_all, SNP_h_corr_matrix);

S_het <- SHet(all_h_t_stats_values, Sample_size_all, SNP_h_corr_matrix)

# obtain P-values of S_Het using the estimated gamma parameters
  
p_S_het <- pgamma(q = S_het-para[3], shape = para[1], scale = para[2], lower.tail = F) %>% 
  as_tibble() %>% 
  bind_cols(S_het) %>% 
  rename(meta_p_het = value, 
         S_het = ...2)

# bind meta statistics with the univariate effect sizes

all_h_meta_results <- 
  all_h_t_stats %>% 
  bind_cols(p_S_hom,
            p_S_het)

write_csv(all_h_meta_results, "data/Derived/GWAS_results/all_h_meta_results.csv")
} else all_h_meta_results <- read_csv("data/Derived/GWAS_results/all_h_meta_results.csv")

Visualise the results

We combine GWAS summary statistics calculated from lifespan data measured across different contexts. It’s likely that some SNPs have G x E interactions that would lead to a heterogeneous effect across treatments. We therefore utilise the S_het calculated p-values.

First lets show the effect of CPASSOC on the number of variants found to be associated with life expectancy and lifespan equality.

Table SX. the number of variants associated with life expectancy and lifespan equality at various significance thresholds, estimated by univariate GWAS or CPASSOC.

Show the code
tibble(Analysis = c("CPASSOC", "Univariate", "CPASSOC", "Univariate"),
       Trait = c("Life expectancy", "Life expectancy", "Lifespan equality", "Lifespan equality"),
       `p < 1e-05 variants` = c(sum(all_e0_meta_results$meta_p_het < 1e-05),
                                nrow(p_05_SNPs_l),
                                sum(all_h_meta_results$meta_p_het < 1e-05),
                                nrow(p_05_SNPs_h)),
       `p < 1e-05 genomic regions` = c(nrow(all_e0_meta_results %>% filter(meta_p_het < 1e-05) %>% inner_join(Genomic_regions)),
                                          nrow(p_05_SNPs_l %>% filter(Pruned_variant == "YES")),
                                          nrow(all_h_meta_results %>% filter(meta_p_het < 1e-05) %>% inner_join(Genomic_regions)),
                                          nrow(p_05_SNPs_h %>% filter(Pruned_variant == "YES"))),
       `p < 1e-08 variants` = c(sum(all_e0_meta_results$meta_p_het < 1e-08),
                                sum(e0_table$`p < 1e-08 variants`),
                                sum(all_h_meta_results$meta_p_het < 1e-08),
                                sum(h_table$`p < 1e-08 variants`)),
       `p < 1e-08 genomic regions` = c(nrow(all_e0_meta_results %>% filter(meta_p_het < 1e-08) %>% inner_join(Genomic_regions)),
                                          sum(h_table$`p < 1e-08 variants`),
                                          nrow(all_h_meta_results %>% filter(meta_p_het < 1e-08) %>% inner_join(Genomic_regions)),
                                          sum(h_table$`p < 1e-08 variants`)))  %>% 
  kable() %>% 
  kable_styling()
Analysis Trait p < 1e-05 variants p < 1e-05 genomic regions p < 1e-08 variants p < 1e-08 genomic regions
CPASSOC Life expectancy 467 248 121 58
Univariate Life expectancy 389 39 0 0
CPASSOC Lifespan equality 158 78 19 10
Univariate Lifespan equality 270 37 0 0

Table SX. genes that encompass or are very close to the genetic variants that have strong associations with life expectancy.

Show the code
# join gene annotations with the list of analysed variants 
# note that some SNPs are associated with >1 gene, because the gene annotations overlap (I think) or the variant is close to multiple annotated genes. Others are not near any known genes, producing NAs.

life_expectancy_variants <-
  all_e0_meta_results %>%
  filter(meta_p_het < 1e-08) %>% 
  dplyr::select(SNP, S_het, meta_p_het) %>%
  left_join(annotations %>% filter(distance.to.gene <= 500)) %>% 
  mutate(meta_p_het = signif(meta_p_het*10^18, 3)/10^18,
         S_het = round(S_het, 3)) %>% 
  dplyr::select(SNP, S_het, meta_p_het, FBID, gene_name, site.class, distance.to.gene)

life_expectancy_variants %>% 
  my_data_table()

Table SX. genes that encompass or are very close to the genetic variants that have strong associations with lifespan equality.

Show the code
# join gene annotations with the list of analysed variants 
# note that some SNPs are associated with >1 gene, because the gene annotations overlap (I think) or the variant is close to multiple annotated genes. Others are not near any known genes, producing NAs.

lifespan_equality_variants <-
  all_h_meta_results %>%
  filter(meta_p_het < 1e-08) %>% 
  dplyr::select(SNP, S_het, meta_p_het) %>%
  left_join(annotations %>% filter(distance.to.gene <= 500)) %>% 
  mutate(meta_p_het = signif(meta_p_het*10^15, 3)/10^15,
         S_het = round(S_het, 3)) %>% 
  dplyr::select(SNP, S_het, meta_p_het, FBID, gene_name, site.class, distance.to.gene)

lifespan_equality_variants %>% 
  my_data_table()

Now build some ‘Manhattan plots’ to show where these significant associations can be found:

Show the code
e0_results <- 
  all_e0_meta_results %>% 
  inner_join(Genomic_regions) %>% 
  dplyr::select(SNP, meta_p_hom, meta_p_het) %>% 
  rename(P = meta_p_het) %>% 
  mutate(logp = -log10(P))

h_results <- 
  all_h_meta_results %>% 
  dplyr::select(SNP, meta_p_hom, meta_p_het) %>% 
  inner_join(Genomic_regions) %>% 
  dplyr::select(SNP, meta_p_hom, meta_p_het) %>% 
  rename(P = meta_p_het) %>% 
  mutate(logp = -log10(P))

# plot the results using the manhattan plot custom function we defined earlier

e0_het_plot <- build_manhattan_plot(e0_results) +
  labs(title = "Life expectancy") +
  theme(plot.title = element_text(size = 20, hjust = 0.5)) +
  scale_y_continuous(limits = c(0, 21), expand = c(0, 0))

h_het_plot <- build_manhattan_plot(h_results) +
  labs(title = "Lifespan equality") +
  theme(plot.title = element_text(size = 20, hjust = 0.5)) +
  scale_y_continuous(limits = c(0, 21), expand = c(0, 0))

e0_het_plot + h_het_plot + plot_annotation(tag_levels = "A")

Figure XX. Manhattan plot showing the -Log10 p-value for each genomic region’s effect on A) life expectancy and B) lifespan equality.

Plot the univariate effect sizes for each of the regions associated with life expectancy / lifespan equality at the genome-wide significance threshold (p < \(0.05^{-8}\)) after CPASSOC.

Life expectancy

Show the code
SNP_heatmap_e0 <-
  SNP_beta_e0 %>% 
  inner_join(
    all_e0_meta_results %>% 
      filter(meta_p_het < 1e-08) %>% 
      dplyr::select(SNP) %>% 
      inner_join(Genomic_regions))

row_name <- SNP_heatmap_e0$SNP

SNP_heatmap_e0 <- SNP_heatmap_e0 %>% dplyr::select(-SNP) %>% as.matrix()

rownames(SNP_heatmap_e0) <- row_name

breaksList <- seq(-7, 7, by = 0.01)

annotation_SNPs <- 
  all_e0_meta_results %>% filter(meta_p_het < 1e-08) %>% dplyr::select(SNP) %>% 
  mutate(Chromosome = case_when(str_detect(SNP, "2L") ~ "2L",
                                str_detect(SNP, "2R") ~ "2R",
                                str_detect(SNP, "3L") ~ "3L",
                                str_detect(SNP, "3R") ~ "3R",
                                str_detect(SNP, "X") ~ "X"))

annotation_studies <- 
  tibble(Study = c("Arya_f_25",
                   "Huang_f_18",
                   "Huang_f_25",
                   "Huang_f_28",
                   "Wilson_f_25_1",
                   "Wilson_f_25_2",
                   "Durham_f_25",
                   "Patel_f_23",
                   "Arya_m_25",
                   "Huang_m_18",
                   "Huang_m_25",
                   "Huang_m_28"),
         Temperature = c("25",
                         "18",
                         "25",
                         "28",
                         "25",
                         "25",
                         "25",
                         "23",
                         "25",
                         "18",
                         "25",
                         "28")) %>% 
  mutate(Sex = case_when(str_detect(Study, "_f") ~ "Female",
                         .default = "Male"),
         Mating = case_when(str_detect(Study, "Arya") ~ "NO",
                             str_detect(Study, "Huang") ~ "Throughout life",
                             str_detect(Study, "Wilson") ~ "Early life",
                             str_detect(Study, "Durham") ~ "Throughout life",
                             str_detect(Study, "Patel") ~ "Early life"),
         Author = str_extract(Study, ".*(?=\\_)"),
         Author = str_remove(Author, "_f"),
         Author = str_remove(Author, "_m"))


# create a study annotation column, need this to be a data.frame rather than a tibble for some reason 

Study_details <- annotation_studies %>%
  as.data.frame() %>% 
  dplyr::select(Study, Temperature, Mating)

my_categories <- data.frame(row.names = Study_details[, 1], Temperature = Study_details[, 2],
                            Mating = Study_details[, 3])

my_colors <- list(Temperature = c("18" = "#7bbcd5", # sailboat colours from pnw
                                  "23" = "#d0e2af",
                                  "25" = "#f5db99",
                                  "28" = "#e89c81"),
                  Mating = c("NO" = "#f8e3d1", # Shuksan from pnw
                             "Early life" = "#d7b1c5",
                             "Throughout life" = "#ac8eab"),
                  Chromosome = c("2L" = "#d8aedd", # lake colours from pnw
                                 "2R" = "#cb74ad",
                                 "3L" = "#11c2b5",
                                 "3R" = "#72e1e1",
                                 "X" = "#fbcc74"))
# create a SNP annotation column

SNP_details <- annotation_SNPs %>%
  as.data.frame()

my_SNP_categories <- data.frame(row.names = SNP_details[, 1], Chromosome = SNP_details[, 2])

my_col_names <- c("Arya et al females", "Huang et al females", "Huang et al females",
                  "Huang et al females", "Wilson et al females 1", "Wilson et al females 2", "Durham et al females",
                  "Patel et al females", "Arya et al males", "Huang et al males", "Huang et al males",
                  "Huang et al males")


pheatmap(SNP_heatmap_e0, breaks = breaksList, 
         main = "",
         color = colorRampPalette(rev(met.brewer("Benedictus", direction = 1)))(length(breaksList)),
         legend = TRUE, cutree_rows = 6, cutree_cols = 5, 
         angle_col = 45, border_color = "white",
         annotation_col = my_categories, annotation_colors = my_colors, annotation_row = my_SNP_categories,
         fontsize = 8, labels_col = my_col_names)

Figure SX. univariate effect sizes for each of the genomic regions associated with life expectancy at the genome-wide significance threshold (p < \(10^{-8}\)) after CPASSOC. Effect sizes are expressed in days added to life expectancy per major allele copy. Studies are clustered by similiarity in genetic effects on the X axis, while genomic regions are clustered by similarity in effect size across studies on the Y axis.

Lifespan equality

Show the code
SNP_heatmap_h <-
  SNP_beta_h %>% 
  inner_join(
    all_h_meta_results %>% 
      filter(meta_p_het < 1e-08) %>% 
      dplyr::select(SNP) %>% 
      inner_join(Genomic_regions))

row_name <- SNP_heatmap_h$SNP
SNP_heatmap_h <- SNP_heatmap_h %>% dplyr::select(-SNP) %>% as.matrix()
rownames(SNP_heatmap_h) = row_name

breaksList <- seq(-0.15, 0.15, by = 0.001)

annotation_SNPs_h <- 
  all_h_meta_results %>% filter(meta_p_het < 1e-08) %>% dplyr::select(SNP) %>% 
  mutate(Chromosome = case_when(str_detect(SNP, "2L") ~ "2L",
                                str_detect(SNP, "2R") ~ "2R",
                                str_detect(SNP, "3L") ~ "3L",
                                str_detect(SNP, "3R") ~ "3R",
                                str_detect(SNP, "X") ~ "X"))


# create a SNP annotation column

SNP_details_h <- annotation_SNPs_h %>%
  as.data.frame()

my_SNP_categories_h <- data.frame(row.names = SNP_details_h[, 1], Chromosome = SNP_details_h[, 2])

pheatmap(SNP_heatmap_h, breaks = breaksList, 
         main = "",
         color = colorRampPalette(rev(met.brewer("Benedictus", direction = 1)))(length(breaksList)),
         legend = TRUE, cutree_rows = 3, cutree_cols = 4, angle_col = 45, border_color = "white",
         annotation_col = my_categories, annotation_colors = my_colors, 
         annotation_row = my_SNP_categories_h,
         fontsize = 8, labels_col = my_col_names)

Figure XX. univariate effect sizes for each of the genomic regions associated with lifespan equality at the genome-wide significance threshold (p < \(10^{-8}\)) after CPASSOC. Effect sizes are expressed in equality added per major allele copy. Studies are clustered by similiarity in genetic effects on the X axis, while genomic regions are clustered by similarity in effect size across studies on the Y axis.

Analysing the rate of ageing and baseline mortality

Axes of ageing rate and baseline mortality

We’ve shown that orthogonal deviation from the regression of lifespan equality on life expectancy closely corresponds to the rate of ageing (\(\beta\)) parameter in a Gompertz-Makeham mortality model. To identify regions of the genome associated with the rate of ageing, we can calculate a rate of ageing index for each line in each treatment. To create this index, we rotate the coordinate system of the life expectancy and lifespan equality plane by \(\theta\) degrees, where \(\theta\) is the angle between the positive x-axis and the regression of lifespan equality on life expectancy.

Finding the slopes

Show the code
# create a dataframe with which mean regression lines can be predicted from each model. It spans 4 SDs in either direction.

 nd <- 
  tibble(e0 = seq(from = 0, to = 180, length.out = 180))

# fit the models

Arya_f_model <- brm(h ~ 1 + e0,
            #prior = c(prior(normal(0, 0.1), class = Intercept),
             #         prior(normal(0, 1), class = b),
              #        prior(exponential(1), class = sigma)),
            family = gaussian,
            iter = 6000, warmup = 2000,
            control = list(adapt_delta = 0.8, max_treedepth = 10),
            data = Arya_2010_f, chains = 4, cores = 4, 
            file = "data/Derived/Ageing_axis_slopes/Arya_f_slope",
            backend = "cmdstanr", stan_model_args = list(stanc_options = list("O1")),
            refresh = 400, silent = 0, seed = 1)

Arya_f_slope <-
  as_draws_df(Arya_f_model) %>% 
  as_tibble() %>% 
  dplyr::select(b_e0) %>% 
  summarise(slope = mean(b_e0)) %>% pull(slope)

 Arya_regression_line_f <-
   fitted(Arya_f_model,
        newdata = nd) %>% 
   data.frame() %>% 
   bind_cols(nd) %>% 
   dplyr::select(Estimate, e0)

Arya_m_model <- brm(h ~ 1 + e0,
            #prior = c(prior(normal(0, 0.1), class = Intercept),
             #         prior(normal(0, 1), class = b),
              #        prior(exponential(1), class = sigma)),
            family = gaussian,
            iter = 6000, warmup = 2000,
            control = list(adapt_delta = 0.8, max_treedepth = 10),
            data = Arya_2010_m, chains = 4, cores = 4, file = "data/Derived/Ageing_axis_slopes/Arya_m_slope",
            backend = "cmdstanr", stan_model_args = list(stanc_options = list("O1")),
            refresh = 400, silent = 0, seed = 1)

Arya_m_slope <-
  as_draws_df(Arya_m_model) %>% 
  as_tibble() %>% 
  dplyr::select(b_e0) %>% 
  summarise(slope = mean(b_e0)) %>% pull(slope)

 Arya_regression_line_m <-
   fitted(Arya_m_model,
        newdata = nd) %>% 
   data.frame() %>% 
   bind_cols(nd) %>% 
   dplyr::select(Estimate, e0)

Huang_f_18_model <- brm(h ~ 1 + e0,
            #prior = c(prior(normal(0, 0.1), class = Intercept),
             #         prior(normal(0, 1), class = b),
              #        prior(exponential(1), class = sigma)),
            family = gaussian,
            iter = 6000, warmup = 2000,
            control = list(adapt_delta = 0.8, max_treedepth = 10),
            data = Huang_2020_f_18, chains = 4, cores = 4, 
            file = "data/Derived/Ageing_axis_slopes/Huang_f_18_slope",
            backend = "cmdstanr", stan_model_args = list(stanc_options = list("O1")),
            refresh = 400, silent = 0, seed = 1)

Huang_f_18_slope <-
  as_draws_df(Huang_f_18_model) %>% 
  as_tibble() %>% 
  dplyr::select(b_e0) %>% 
  summarise(slope = mean(b_e0)) %>% pull(slope)

 Huang_f_18_regression_line <-
   fitted(Huang_f_18_model, newdata = nd) %>% 
   data.frame() %>% 
   bind_cols(nd) %>% 
   dplyr::select(Estimate, e0)

Huang_m_18_model <- brm(h ~ 1 + e0,
            #prior = c(prior(normal(0, 0.1), class = Intercept),
             #         prior(normal(0, 1), class = b),
              #        prior(exponential(1), class = sigma)),
            family = gaussian,
            iter = 6000, warmup = 2000,
            control = list(adapt_delta = 0.8, max_treedepth = 10),
            data = Huang_2020_m_18, chains = 4, cores = 4, 
            file = "data/Derived/Ageing_axis_slopes/Huang_m_18_slope",
            backend = "cmdstanr", stan_model_args = list(stanc_options = list("O1")),
            refresh = 400, silent = 0, seed = 1)

Huang_m_18_slope <-
  as_draws_df(Huang_m_18_model) %>% 
  as_tibble() %>% 
  dplyr::select(b_e0) %>% 
  summarise(slope = mean(b_e0)) %>% pull(slope)

 Huang_m_18_regression_line <-
   fitted(Huang_m_18_model,
        newdata = nd) %>% 
   data.frame() %>% 
   bind_cols(nd) %>% 
   dplyr::select(Estimate, e0)

Huang_f_25_model <- brm(h ~ 1 + e0,
            #prior = c(prior(normal(0, 0.1), class = Intercept),
             #         prior(normal(0, 1), class = b),
              #        prior(exponential(1), class = sigma)),
            family = gaussian,
            iter = 6000, warmup = 2000,
            control = list(adapt_delta = 0.8, max_treedepth = 10),
            data = Huang_2020_f_25, chains = 4, cores = 4, 
            file = "data/Derived/Ageing_axis_slopes/Huang_f_25_slope",
            backend = "cmdstanr", stan_model_args = list(stanc_options = list("O1")),
            refresh = 400, silent = 0, seed = 1)

Huang_f_25_slope <-
  as_draws_df(Huang_f_25_model) %>% 
  as_tibble() %>% 
  dplyr::select(b_e0) %>% 
  summarise(slope = mean(b_e0)) %>% pull(slope)

 Huang_f_25_regression_line <-
   fitted(Huang_f_25_model,
        newdata = nd) %>% 
   data.frame() %>% 
   bind_cols(nd) %>% 
   dplyr::select(Estimate, e0)

Huang_m_25_model <- brm(h ~ 1 + e0,
            #prior = c(prior(normal(0, 0.1), class = Intercept),
             #         prior(normal(0, 1), class = b),
              #        prior(exponential(1), class = sigma)),
            family = gaussian,
            iter = 6000, warmup = 2000,
            control = list(adapt_delta = 0.8, max_treedepth = 10),
            data = Huang_2020_m_25, chains = 4, cores = 4, 
            file = "data/Derived/Ageing_axis_slopes/Huang_m_25_slope",
            backend = "cmdstanr", stan_model_args = list(stanc_options = list("O1")),
            refresh = 400, silent = 0, seed = 1)

Huang_m_25_slope <-
  as_draws_df(Huang_m_25_model) %>% 
  as_tibble() %>% 
  dplyr::select(b_e0) %>% 
  summarise(slope = mean(b_e0)) %>% pull(slope)

 Huang_m_25_regression_line <-
   fitted(Huang_m_25_model,
        newdata = nd) %>% 
   data.frame() %>% 
   bind_cols(nd) %>% 
   dplyr::select(Estimate, e0)

Huang_f_28_model <- brm(h ~ 1 + e0,
            #prior = c(prior(normal(0, 0.1), class = Intercept),
             #         prior(normal(0, 1), class = b),
              #        prior(exponential(1), class = sigma)),
            family = gaussian,
            iter = 6000, warmup = 2000,
            control = list(adapt_delta = 0.8, max_treedepth = 10),
            data = Huang_2020_f_28, chains = 4, cores = 4, 
            file = "data/Derived/Ageing_axis_slopes/Huang_f_28_slope",
            backend = "cmdstanr", stan_model_args = list(stanc_options = list("O1")),
            refresh = 400, silent = 0, seed = 1)

Huang_f_28_slope <-
  as_draws_df(Huang_f_28_model) %>% 
  as_tibble() %>% 
  dplyr::select(b_e0) %>% 
  summarise(slope = mean(b_e0)) %>% pull(slope)

 Huang_f_28_regression_line <-
   fitted(Huang_f_28_model,
        newdata = nd) %>% 
   data.frame() %>% 
   bind_cols(nd) %>% 
   dplyr::select(Estimate, e0)

Huang_m_28_model <- brm(h ~ 1 + e0,
            #prior = c(prior(normal(0, 0.1), class = Intercept),
             #         prior(normal(0, 1), class = b),
              #        prior(exponential(1), class = sigma)),
            family = gaussian,
            iter = 6000, warmup = 2000,
            control = list(adapt_delta = 0.8, max_treedepth = 10),
            data = Huang_2020_m_28, chains = 4, cores = 4, 
            file = "data/Derived/Ageing_axis_slopes/Huang_m_28_slope",
            backend = "cmdstanr", stan_model_args = list(stanc_options = list("O1")),
            refresh = 400, silent = 0, seed = 1)

Huang_m_28_slope <-
  as_draws_df(Huang_m_28_model) %>% 
  as_tibble() %>% 
  dplyr::select(b_e0) %>% 
  summarise(slope = mean(b_e0)) %>% pull(slope)

 Huang_m_28_regression_line <-
   fitted(Huang_m_28_model,
        newdata = nd) %>% 
   data.frame() %>% 
   bind_cols(nd) %>% 
   dplyr::select(Estimate, e0)

Wilson_f_model_1 <- brm(h ~ 1 + e0,
            #prior = c(prior(normal(0, 0.1), class = Intercept),
             #         prior(normal(0, 1), class = b),
              #        prior(exponential(1), class = sigma)),
            family = gaussian,
            iter = 6000, warmup = 2000,
            control = list(adapt_delta = 0.8, max_treedepth = 10),
            data = Wilson_2020_f_1, chains = 4, cores = 4, 
            file = "data/Derived/Ageing_axis_slopes/Wilson_f_slope_1",
            backend = "cmdstanr", stan_model_args = list(stanc_options = list("O1")),
            refresh = 400, silent = 0, seed = 1)

Wilson_f_slope_1 <-
  as_draws_df(Wilson_f_model_1) %>% 
  as_tibble() %>% 
  dplyr::select(b_e0) %>% 
  summarise(slope = mean(b_e0)) %>% pull(slope)

 Wilson_f_regression_line_1 <-
   fitted(Wilson_f_model_1,
        newdata = nd) %>% 
   data.frame() %>% 
   bind_cols(nd) %>% 
   dplyr::select(Estimate, e0)

Wilson_f_model_2 <- brm(h ~ 1 + e0,
            #prior = c(prior(normal(0, 0.1), class = Intercept),
             #         prior(normal(0, 1), class = b),
              #        prior(exponential(1), class = sigma)),
            family = gaussian,
            iter = 6000, warmup = 2000,
            control = list(adapt_delta = 0.8, max_treedepth = 10),
            data = Wilson_2020_f_2, chains = 4, cores = 4, 
            file = "data/Derived/Ageing_axis_slopes/Wilson_f_slope_2",
            backend = "cmdstanr", stan_model_args = list(stanc_options = list("O1")),
            refresh = 400, silent = 0, seed = 1)

Wilson_f_slope_2 <-
  as_draws_df(Wilson_f_model_2) %>% 
  as_tibble() %>% 
  dplyr::select(b_e0) %>% 
  summarise(slope = mean(b_e0)) %>% pull(slope)

 Wilson_f_regression_line_2 <-
   fitted(Wilson_f_model_2,
        newdata = nd) %>% 
   data.frame() %>% 
   bind_cols(nd) %>% 
   dplyr::select(Estimate, e0)

Durham_f_model <- brm(h ~ 1 + e0,
            #prior = c(prior(normal(0, 0.1), class = Intercept),
             #         prior(normal(0, 1), class = b),
              #        prior(exponential(1), class = sigma)),
            family = gaussian,
            iter = 6000, warmup = 2000,
            control = list(adapt_delta = 0.8, max_treedepth = 10),
            data = Durham_2014_f, chains = 4, cores = 4, 
            file = "data/Derived/Ageing_axis_slopes/Durham_f_slope",
            backend = "cmdstanr", stan_model_args = list(stanc_options = list("O1")),
            refresh = 400, silent = 0, seed = 1)

Durham_f_slope <-
  as_draws_df(Durham_f_model) %>% 
  as_tibble() %>% 
  dplyr::select(b_e0) %>% 
  summarise(slope = mean(b_e0)) %>% pull(slope)

 Durham_f_regression_line <-
   fitted(Durham_f_model,
        newdata = nd) %>% 
   data.frame() %>% 
   bind_cols(nd) %>% 
   dplyr::select(Estimate, e0)


Patel_f_model <- brm(h ~ 1 + e0,
            #prior = c(prior(normal(0, 0.1), class = Intercept),
             #         prior(normal(0, 1), class = b),
              #        prior(exponential(1), class = sigma)),
            family = gaussian,
            iter = 6000, warmup = 2000,
            control = list(adapt_delta = 0.8, max_treedepth = 10),
            data = Patel_2021_f, chains = 4, cores = 4, file = "data/Derived/Ageing_axis_slopes/Patel_f_slope",
            backend = "cmdstanr", stan_model_args = list(stanc_options = list("O1")),
            refresh = 400, silent = 0, seed = 1)

Patel_f_slope <-
  as_draws_df(Patel_f_model) %>% 
  as_tibble() %>% 
  dplyr::select(b_e0) %>% 
  summarise(slope = mean(b_e0)) %>% pull(slope)
 
 Patel_regression_line <-
   fitted(Patel_f_model,
        newdata = nd) %>% 
   data.frame() %>% 
   bind_cols(nd) %>% 
   dplyr::select(Estimate, e0)

With regression coefficients found, we use the following formula to calculate the angle (in radians) between the mean regression line and the x-axis:

\(\theta = tan^{-1}(\beta)\)

where \(\beta\) is the point estimate of the slope from each model posterior distribution.

Show the code
Arya_f_angle <- atan(Arya_f_slope)
Arya_m_angle <- atan(Arya_m_slope)
Huang_f_18_angle <- atan(Huang_f_18_slope)
Huang_m_18_angle <- atan(Huang_m_18_slope)
Huang_f_25_angle <- atan(Huang_f_25_slope)
Huang_m_25_angle <- atan(Huang_m_25_slope)
Huang_f_28_angle <- atan(Huang_f_28_slope)
Huang_m_28_angle <- atan(Huang_m_28_slope)
Wilson_f_1_angle <- atan(Wilson_f_slope_1)
Wilson_f_2_angle <- atan(Wilson_f_slope_2)
Durham_f_angle <- atan(Durham_f_slope)
Patel_f_angle <- atan(Patel_f_slope)

We then rotated the coordinate system of the life expectancy and lifespan equality plane clockwise by \(\theta\):

\[x' = -(x\cos(\theta) + y\sin(\theta))\]

\[y' = -(x\sin(\theta) - y\cos(\theta))\]

where \(x'\) and \(y'\) are the vectors of genotype means for baseline mortality rate and ageing rate, and \(x\) and \(y\) are vectors of genotype means for life expectancy and lifespan equality. We perform this transformation on the unscaled data.

Show the code
Arya_2010_f <-
  Arya_2010_f %>% 
    mutate(baseline_mortality_axis = -1*(e0*cos(Arya_f_angle) + h*sin(Arya_f_angle)),
         ageing_axis = -1*(e0*sin(Arya_f_angle) - h*cos(Arya_f_angle)),
         baseline_mortality_axis_centered = baseline_mortality_axis - mean(baseline_mortality_axis),
         ageing_axis_centered = ageing_axis - mean(ageing_axis))

Arya_2010_m <-
  Arya_2010_m %>% 
    mutate(baseline_mortality_axis = -1*(e0*cos(Arya_m_angle) + h*sin(Arya_m_angle)),
         ageing_axis = -1*(e0*sin(Arya_m_angle) - h*cos(Arya_m_angle)),
         baseline_mortality_axis_centered = baseline_mortality_axis - mean(baseline_mortality_axis),
         ageing_axis_centered = ageing_axis - mean(ageing_axis))

Huang_2020_f_18 <-
  Huang_2020_f_18 %>% 
    mutate(baseline_mortality_axis = -1*(e0*cos(Huang_f_18_angle) + h*sin(Huang_f_18_angle)),
         ageing_axis = -1*(e0*sin(Huang_f_18_angle) - h*cos(Huang_f_18_angle)),
         baseline_mortality_axis_centered = baseline_mortality_axis - mean(baseline_mortality_axis),
         ageing_axis_centered = ageing_axis - mean(ageing_axis))

Huang_2020_m_18 <-
  Huang_2020_m_18 %>% 
    mutate(baseline_mortality_axis = -1*(e0*cos(Huang_m_18_angle) + h*sin(Huang_m_18_angle)),
         ageing_axis = -1*(e0*sin(Huang_m_18_angle) - h*cos(Huang_m_18_angle)),
         baseline_mortality_axis_centered = baseline_mortality_axis - mean(baseline_mortality_axis),
         ageing_axis_centered = ageing_axis - mean(ageing_axis))

Huang_2020_f_25 <-
  Huang_2020_f_25 %>% 
    mutate(baseline_mortality_axis = -1*(e0*cos(Huang_f_25_angle) + h*sin(Huang_f_25_angle)),
         ageing_axis = -1*(e0*sin(Huang_f_25_angle) - h*cos(Huang_f_25_angle)),
         baseline_mortality_axis_centered = baseline_mortality_axis - mean(baseline_mortality_axis),
         ageing_axis_centered = ageing_axis - mean(ageing_axis))

Huang_2020_m_25 <-
  Huang_2020_m_25 %>% 
    mutate(baseline_mortality_axis = -1*(e0*cos(Huang_m_25_angle) + h*sin(Huang_m_25_angle)),
         ageing_axis = -1*(e0*sin(Huang_m_25_angle) - h*cos(Huang_m_25_angle)),
         baseline_mortality_axis_centered = baseline_mortality_axis - mean(baseline_mortality_axis),
         ageing_axis_centered = ageing_axis - mean(ageing_axis))

Huang_2020_f_28 <-
  Huang_2020_f_28 %>% 
    mutate(baseline_mortality_axis = -1*(e0*cos(Huang_f_28_angle) + h*sin(Huang_f_28_angle)),
         ageing_axis = -1*(e0*sin(Huang_f_28_angle) - h*cos(Huang_f_28_angle)),
         baseline_mortality_axis_centered = baseline_mortality_axis - mean(baseline_mortality_axis),
         ageing_axis_centered = ageing_axis - mean(ageing_axis))

Huang_2020_m_28 <-
  Huang_2020_m_28 %>% 
    mutate(baseline_mortality_axis = -1*(e0*cos(Huang_m_28_angle) + h*sin(Huang_m_28_angle)),
         ageing_axis = -1*(e0*sin(Huang_m_28_angle) - h*cos(Huang_m_28_angle)),
         baseline_mortality_axis_centered = baseline_mortality_axis - mean(baseline_mortality_axis),
         ageing_axis_centered = ageing_axis - mean(ageing_axis))

Wilson_2020_f_1 <-
  Wilson_2020_f_1 %>% 
    mutate(baseline_mortality_axis = -1*(e0*cos(Wilson_f_1_angle) + h*sin(Wilson_f_1_angle)),
         ageing_axis = -1*(e0*sin(Wilson_f_1_angle) - h*cos(Wilson_f_1_angle)),
         baseline_mortality_axis_centered = baseline_mortality_axis - mean(baseline_mortality_axis),
         ageing_axis_centered = ageing_axis - mean(ageing_axis))

Wilson_2020_f_2 <-
  Wilson_2020_f_2 %>% 
    mutate(baseline_mortality_axis = -1*(e0*cos(Wilson_f_2_angle) + h*sin(Wilson_f_2_angle)),
         ageing_axis = -1*(e0*sin(Wilson_f_2_angle) - h*cos(Wilson_f_2_angle)),
         baseline_mortality_axis_centered = baseline_mortality_axis - mean(baseline_mortality_axis),
         ageing_axis_centered = ageing_axis - mean(ageing_axis))

Durham_2014_f <-
  Durham_2014_f %>% 
    mutate(baseline_mortality_axis = -1*(e0*cos(Durham_f_angle) + h*sin(Durham_f_angle)),
         ageing_axis = -1*(e0*sin(Durham_f_angle) - h*cos(Durham_f_angle)),
         baseline_mortality_axis_centered = baseline_mortality_axis - mean(baseline_mortality_axis),
         ageing_axis_centered = ageing_axis - mean(ageing_axis))

Patel_2021_f <- 
  Patel_2021_f %>% 
    mutate(baseline_mortality_axis = -1*(e0*cos(Patel_f_angle) + h*sin(Patel_f_angle)),
         ageing_axis = -1*(e0*sin(Patel_f_angle) - h*cos(Patel_f_angle)),
         baseline_mortality_axis_centered = baseline_mortality_axis - mean(baseline_mortality_axis),
         ageing_axis_centered = ageing_axis - mean(ageing_axis))

Finally, simulate curves from for the Gompertz-Makeham distribution to show the correlation between the \(\alpha\) and \(\beta\) parametrs and our baseline mortality and ageing rate proxies.

Show the code
# script to draw h~e0 for different gompertz b
# a sequence
a_seq <- seq(-30,2,0.02)
# b sequence
b_seq <- seq(-5,-0.5,0.5)
b_seq <- exp(b_seq)

gomp_seq <- data.frame(b=NULL,e0=NULL,h=NULL)

age_seq <- seq(0,10000,0.1)

Run_sim <- FALSE # change to TRUE to run the sim

if(Run_sim){
  
  for(i in 1:length(b_seq)){
    for (j in 1:length(a_seq)){
      lx <- exp(-exp(a_seq[j])/b_seq[i]*(exp(b_seq[i]*age_seq)-1))
      lx <- lx[lx!=0]
      if(tail(lx,1)<0.1){
        e0_gomp <- sum(lx)*0.1
        disparity <- -sum(lx*log(lx))*0.1
        h_gomp <- -log(disparity/e0_gomp)
        
      }
      gomp_seq <- rbind(gomp_seq,c(b_seq[i],e0_gomp,h_gomp))
    }
  }
  write_csv(gomp_seq, "data/Derived/gompertz_simulation.csv")
} else{
  gomp_seq <- read_csv("data/Derived/gompertz_simulation.csv")}

names(gomp_seq) <- c("b","e0","h")

gomp_seq$b <- log(gomp_seq$b)

gomp_seq$b <- as.factor(gomp_seq$b)

Plot the line means, coloured by their value on the ageing rate axis.

Show the code
rotated_axis_plot <- function(data, regression_data, which_axis, fill_title, study_title, limit){
  data %>% 
    ggplot(aes(x = e0, y = h)) +
    geom_line(data = gomp_seq,
              aes(x = e0, y = h, group = b), alpha = 0.4, linetype = 2) +
    geom_point(aes(fill = which_axis), shape = 21, size = 4) +
    scale_fill_moma_c("Avedon", direction = -1, limits = c(-1*limit, limit)) +
    geom_smooth(data = regression_data,
                aes(y = Estimate),
                stat = "identity",
                alpha = 1/2, linewidth = 1) +
    scale_x_continuous(limits = c(5, 145), 
                       breaks = c(0, 30, 60, 90, 120), expand = c(0, 0)) +
    scale_y_continuous(limits = c(0.1, 3.5), 
                       breaks = c(1, 2, 3), expand = c(0, 0)) +
    labs(fill = fill_title,
         x = "Life expectancy",
         y = "Lifespan equality",
         title = study_title) +
    theme_bw() +
    theme(plot.title = element_text(hjust = 0.5),
          panel.grid = element_blank(),
          axis.title.y = element_markdown(size = 12),
          axis.title.x = element_markdown(size = 12),
          axis.text.x = element_text(size = 10),
          axis.text.y = element_text(size = 10))
}

a <- rotated_axis_plot(Arya_2010_f, Arya_regression_line_f, which_axis = Arya_2010_f$ageing_axis_centered, 
                 "Ageing\nrate", "Arya 25C females", limit = 1.6)

a.1 <- rotated_axis_plot(Arya_2010_f, Arya_regression_line_f, 
                        which_axis = Arya_2010_f$baseline_mortality_axis_centered, 
                        "Baseline\nmortality", "Arya 25C females", limit = 60)

b <- rotated_axis_plot(Arya_2010_m, Arya_regression_line_m, which_axis = Arya_2010_m$ageing_axis_centered, 
                 "Ageing\nrate", "Arya 25C males", limit = 1.6) #+
  #coord_cartesian(xlim = c(20, 80), ylim = c(0.7, 3.3))

b.1 <- rotated_axis_plot(Arya_2010_m, Arya_regression_line_m, 
                         which_axis = Arya_2010_m$baseline_mortality_axis_centered, 
                 "Baseline\nmortality", "Arya 25C males", limit = 60) #+
  #coord_cartesian(xlim = c(20, 80), ylim = c(0.7, 3.3))

c <- rotated_axis_plot(Huang_2020_f_18, Huang_f_18_regression_line, 
                       which_axis = Huang_2020_f_18$ageing_axis_centered,
                      "Ageing\nrate", "Huang 18C females", limit = 1.6) #+
  #coord_cartesian(xlim = c(20, 135), ylim = c(0.4, 2.5))

c.1 <- rotated_axis_plot(Huang_2020_f_18, Huang_f_18_regression_line, 
                       which_axis = Huang_2020_f_18$baseline_mortality_axis_centered,
                      "Baseline\nmortality", "Huang 18C females", limit = 60) #+
  #coord_cartesian(xlim = c(20, 135), ylim = c(0.4, 2.5))

d <- rotated_axis_plot(Huang_2020_m_18, Huang_m_18_regression_line, 
                       which_axis = Huang_2020_m_18$ageing_axis_centered,
                      "Ageing\nrate", "Huang 18C males", limit = 1.6) #+
  #coord_cartesian(xlim = c(30, 140), ylim = c(0.3, 2.5))

d.1 <- rotated_axis_plot(Huang_2020_m_18, Huang_m_18_regression_line, 
                       which_axis = Huang_2020_m_18$baseline_mortality_axis_centered,
                      "Baseline\nmortality", "Huang 18C males", limit = 60) #+
  #coord_cartesian(xlim = c(30, 140), ylim = c(0.3, 2.5))

e <- rotated_axis_plot(Huang_2020_f_25, Huang_f_25_regression_line, 
                       which_axis = Huang_2020_f_25$ageing_axis_centered,
                      "Ageing\nrate", "Huang 25C females", limit = 1.6) #+
  #coord_cartesian(xlim = c(10, 70), ylim = c(0.4, 3))

e.1 <- rotated_axis_plot(Huang_2020_f_25, Huang_f_25_regression_line, 
                       which_axis = Huang_2020_f_25$baseline_mortality_axis_centered,
                      "Baseline\nmortality", "Huang 25C females", limit = 60) #+
  #coord_cartesian(xlim = c(10, 70), ylim = c(0.4, 3))

f <- rotated_axis_plot(Huang_2020_m_25, Huang_m_25_regression_line, 
                       which_axis = Huang_2020_m_25$ageing_axis_centered,
                      "Ageing\nrate", "Huang 25C males", limit = 1.6) #+
  #coord_cartesian(xlim = c(15, 80), ylim = c(0.5, 2.5))

f.1 <- rotated_axis_plot(Huang_2020_m_25, Huang_m_25_regression_line, 
                       which_axis = Huang_2020_m_25$baseline_mortality_axis_centered,
                      "Baseline\nmortality", "Huang 25C males", limit = 60) #+
  #coord_cartesian(xlim = c(15, 80), ylim = c(0.5, 2.5))

g <- rotated_axis_plot(Huang_2020_f_28, Huang_f_28_regression_line, 
                       which_axis = Huang_2020_f_28$ageing_axis_centered,
                      "Ageing\nrate", "Huang 28C females", limit = 1.6) #+
  #coord_cartesian(xlim = c(5, 45), ylim = c(0.2, 3.1))

g.1 <- rotated_axis_plot(Huang_2020_f_28, Huang_f_28_regression_line, 
                       which_axis = Huang_2020_f_28$baseline_mortality_axis_centered,
                      "Baseline\nmortality", "Huang 28C females", limit = 60) #+
  #coord_cartesian(xlim = c(5, 45), ylim = c(0.2, 3.1))

h <- rotated_axis_plot(Huang_2020_m_28, Huang_m_28_regression_line, 
                       which_axis = Huang_2020_m_28$ageing_axis_centered,
                      "Ageing\nrate", "Huang 28C males", limit = 1.6) #+
  #coord_cartesian(xlim = c(5, 45), ylim = c(0.2, 3.1))

h.1 <- rotated_axis_plot(Huang_2020_m_28, Huang_m_28_regression_line, 
                       which_axis = Huang_2020_m_28$baseline_mortality_axis_centered,
                      "Baseline\nmortality", "Huang 28C males", limit = 60) #+
  #coord_cartesian(xlim = c(5, 45), ylim = c(0.2, 3.1))

i <- rotated_axis_plot(Wilson_2020_f_1, Wilson_f_regression_line_1, 
                       which_axis = Wilson_2020_f_1$ageing_axis_centered,
                      "Ageing\nrate", "Wilson 25C females 1", limit = 1.6) #+
  #coord_cartesian(xlim = c(15, 75), ylim = c(0.4, 2.5))

i.1 <- rotated_axis_plot(Wilson_2020_f_1, Wilson_f_regression_line_1, 
                       which_axis = Wilson_2020_f_1$baseline_mortality_axis_centered,
                      "Baseline\nmortality", "Wilson 25C females 1", limit = 60) #+
  #coord_cartesian(xlim = c(15, 75), ylim = c(0.4, 2.5))

j <- rotated_axis_plot(Wilson_2020_f_2, Wilson_f_regression_line_2, 
                       which_axis = Wilson_2020_f_2$ageing_axis_centered,
                      "Ageing\nrate", "Wilson 25C females 2", limit = 1.6) #+
  #coord_cartesian(xlim = c(5, 55), ylim = c(0.1, 2.5))

j.1 <- rotated_axis_plot(Wilson_2020_f_2, Wilson_f_regression_line_2, 
                       which_axis = Wilson_2020_f_2$baseline_mortality_axis_centered,
                      "Baseline\nmortality", "Wilson 25C females 2", limit = 60) #+
  #coord_cartesian(xlim = c(5, 55), ylim = c(0.1, 2.5))

k <- rotated_axis_plot(Durham_2014_f, Durham_f_regression_line, 
                       which_axis = Durham_2014_f$ageing_axis_centered,
                      "Ageing\nrate", "Durham 25C females", limit = 1.6) #+
  #coord_cartesian(xlim = c(15, 65), ylim = c(1.1, 2.3))

k.1 <- rotated_axis_plot(Durham_2014_f, Durham_f_regression_line, 
                       which_axis = Durham_2014_f$baseline_mortality_axis_centered,
                      "Baseline\nmortality", "Durham 25C females", limit = 60) #+
  #coord_cartesian(xlim = c(15, 65), ylim = c(1.1, 2.3))

l <- rotated_axis_plot(Patel_2021_f, Patel_regression_line, 
                       which_axis = Patel_2021_f$ageing_axis_centered,
                      "Ageing\nrate", "Patel 23C females", limit = 1.6) #+
  #coord_cartesian(xlim = c(10, 75), ylim = c(0.1, 3.3))

l.1 <- rotated_axis_plot(Patel_2021_f, Patel_regression_line, 
                       which_axis = Patel_2021_f$baseline_mortality_axis_centered,
                      "Baseline\nmortality", "Patel 23C females", limit = 60) #+
  #coord_cartesian(xlim = c(10, 75), ylim = c(0.1, 3.3))
Show the code
(a | b | c) / (d | e | f) / (g | h| i) / (j | k | l) + #guide_area()) +
  plot_layout(guides = 'collect')

Figure SX. Points show DGRP lines, shaded by their genotypic values for the rate of ageing. Dashed curves show simulation outcomes from a Gompertz-Makeham distribution: the rate of ageing differs between curves but is fixed within them, where the baseline mortality decreases as curves progress to the right. Note that colour shows the rate of ageing relative to the mean within the treatment.

Show the code
(a.1 | b.1 | c.1) / (d.1 | e.1 | f.1) / (g.1 | h.1| i.1) / (j.1 | k.1 | l.1) + #guide_area()) +
  plot_layout(guides = 'collect')

Figure SX. As per Figure SX, except colours indicate our proxy for the baseline rate of ageing.

Run univariate GWAS

Conduct GWAS and save the results.

Show the code
Arya_f_ageing <- prep_for_ageing_GWAS(Arya_2010_f)
Arya_m_ageing <- prep_for_ageing_GWAS(Arya_2010_m)
Huang_f_18_ageing <- prep_for_ageing_GWAS(Huang_2020_f_18)
Huang_m_18_ageing <- prep_for_ageing_GWAS(Huang_2020_m_18)
Huang_f_25_ageing <- prep_for_ageing_GWAS(Huang_2020_f_25)
Huang_m_25_ageing <- prep_for_ageing_GWAS(Huang_2020_m_25)
Huang_f_28_ageing <- prep_for_ageing_GWAS(Huang_2020_f_28)
Huang_m_28_ageing <- prep_for_ageing_GWAS(Huang_2020_m_28)
Wilson_f_ageing_1 <- prep_for_ageing_GWAS(Wilson_2020_f_1)
Wilson_f_ageing_2 <- prep_for_ageing_GWAS(Wilson_2020_f_2)
Durham_f_ageing <- prep_for_ageing_GWAS(Durham_2014_f)
Patel_f_ageing <- prep_for_ageing_GWAS(Patel_2021_f)

Arya_f_baseline_mortality <- prep_for_baseline_mortality_GWAS(Arya_2010_f)
Arya_m_baseline_mortality <- prep_for_baseline_mortality_GWAS(Arya_2010_m)
Huang_f_18_baseline_mortality <- prep_for_baseline_mortality_GWAS(Huang_2020_f_18)
Huang_m_18_baseline_mortality <- prep_for_baseline_mortality_GWAS(Huang_2020_m_18)
Huang_f_25_baseline_mortality <- prep_for_baseline_mortality_GWAS(Huang_2020_f_25)
Huang_m_25_baseline_mortality <- prep_for_baseline_mortality_GWAS(Huang_2020_m_25)
Huang_f_28_baseline_mortality <- prep_for_baseline_mortality_GWAS(Huang_2020_f_28)
Huang_m_28_baseline_mortality <- prep_for_baseline_mortality_GWAS(Huang_2020_m_28)
Wilson_f_baseline_mortality_1 <- prep_for_baseline_mortality_GWAS(Wilson_2020_f_1)
Wilson_f_baseline_mortality_2 <- prep_for_baseline_mortality_GWAS(Wilson_2020_f_2)
Durham_f_baseline_mortality <- prep_for_baseline_mortality_GWAS(Durham_2014_f)
Patel_f_baseline_mortality <- prep_for_baseline_mortality_GWAS(Patel_2021_f)

if(!file.exists("data/Derived/GWAS_results/Arya_f_ageing.tsv.gz")) {
run_GWAS(Arya_f_ageing)
run_GWAS(Arya_m_ageing)
run_GWAS(Huang_f_18_ageing)
run_GWAS(Huang_m_18_ageing)
run_GWAS(Huang_f_25_ageing)
run_GWAS(Huang_m_25_ageing)
run_GWAS(Huang_f_28_ageing)
run_GWAS(Huang_m_28_ageing)
run_GWAS(Wilson_f_ageing_1)
run_GWAS(Wilson_f_ageing_2)
run_GWAS(Durham_f_ageing)
run_GWAS(Patel_f_ageing)

run_GWAS(Arya_f_baseline_mortality)
run_GWAS(Arya_m_baseline_mortality)
run_GWAS(Huang_f_18_baseline_mortality)
run_GWAS(Huang_m_18_baseline_mortality)
run_GWAS(Huang_f_25_baseline_mortality)
run_GWAS(Huang_m_25_baseline_mortality)
run_GWAS(Huang_f_28_baseline_mortality)
run_GWAS(Huang_m_28_baseline_mortality)
run_GWAS(Wilson_f_baseline_mortality_1)
run_GWAS(Wilson_f_baseline_mortality_2)
run_GWAS(Durham_f_baseline_mortality)
run_GWAS(Patel_f_baseline_mortality)
}

Arya_f_ageing_GWAS <- read_tsv("data/Derived/GWAS_results/Arya_f_ageing.tsv.gz") 
Arya_m_ageing_GWAS <- read_tsv("data/Derived/GWAS_results/Arya_m_ageing.tsv.gz") 
Huang_f_18_ageing_GWAS <- read_tsv("data/Derived/GWAS_results/Huang_f_18_ageing.tsv.gz")
Huang_m_18_ageing_GWAS <- read_tsv("data/Derived/GWAS_results/Huang_m_18_ageing.tsv.gz")
Huang_f_25_ageing_GWAS <- read_tsv("data/Derived/GWAS_results/Huang_f_25_ageing.tsv.gz")
Huang_m_25_ageing_GWAS <- read_tsv("data/Derived/GWAS_results/Huang_m_25_ageing.tsv.gz")
Huang_f_28_ageing_GWAS <- read_tsv("data/Derived/GWAS_results/Huang_f_28_ageing.tsv.gz")
Huang_m_28_ageing_GWAS <- read_tsv("data/Derived/GWAS_results/Huang_m_28_ageing.tsv.gz")
Wilson_f_ageing_1_GWAS <- read_tsv("data/Derived/GWAS_results/Wilson_f_ageing_1.tsv.gz")
Wilson_f_ageing_2_GWAS <- read_tsv("data/Derived/GWAS_results/Wilson_f_ageing_2.tsv.gz")
Durham_f_ageing_GWAS <- read_tsv("data/Derived/GWAS_results/Durham_f_ageing.tsv.gz")
Patel_f_ageing_GWAS <- read_tsv("data/Derived/GWAS_results/Patel_f_ageing.tsv.gz")

Arya_f_baseline_mortality_GWAS <- read_tsv("data/Derived/GWAS_results/Arya_f_baseline_mortality.tsv.gz") 
Arya_m_baseline_mortality_GWAS <- read_tsv("data/Derived/GWAS_results/Arya_m_baseline_mortality.tsv.gz") 
Huang_f_18_baseline_mortality_GWAS <- read_tsv("data/Derived/GWAS_results/Huang_f_18_baseline_mortality.tsv.gz")
Huang_m_18_baseline_mortality_GWAS <- read_tsv("data/Derived/GWAS_results/Huang_m_18_baseline_mortality.tsv.gz")
Huang_f_25_baseline_mortality_GWAS <- read_tsv("data/Derived/GWAS_results/Huang_f_25_baseline_mortality.tsv.gz")
Huang_m_25_baseline_mortality_GWAS <- read_tsv("data/Derived/GWAS_results/Huang_m_25_baseline_mortality.tsv.gz")
Huang_f_28_baseline_mortality_GWAS <- read_tsv("data/Derived/GWAS_results/Huang_f_28_baseline_mortality.tsv.gz")
Huang_m_28_baseline_mortality_GWAS <- read_tsv("data/Derived/GWAS_results/Huang_m_28_baseline_mortality.tsv.gz")
Wilson_f_baseline_mortality_1_GWAS <- read_tsv("data/Derived/GWAS_results/Wilson_f_baseline_mortality_1.tsv.gz")
Wilson_f_baseline_mortality_2_GWAS <- read_tsv("data/Derived/GWAS_results/Wilson_f_baseline_mortality_2.tsv.gz")
Durham_f_baseline_mortality_GWAS <- read_tsv("data/Derived/GWAS_results/Durham_f_baseline_mortality.tsv.gz")
Patel_f_baseline_mortality_GWAS <- read_tsv("data/Derived/GWAS_results/Patel_f_baseline_mortality.tsv.gz")

Table SX. Genotype to phenotype associations detected by univariate GWAS, for the rate of ageing. The number of genomic regions indicates the number of genetic variants associated with the rate of ageing after LD pruning. The total row shows the number of unique candidate variants identified across all studies. *Wilson et al. phenotyped lifespan under two separate dietary conditions, which we include separately in our analysis.

Show the code
# filter down to sig associations
ageing_table <-
  bind_rows(
    tibble(`p < 1e-05 variants` = nrow(Arya_f_ageing_GWAS %>% filter(P < 1e-05)),
           `p < 1e-05 genomic regions` = nrow(inner_join(Genomic_regions, 
                                                            Arya_f_ageing_GWAS %>% filter(P < 1e-05))),
           `p < 1e-08 variants` = nrow(filter(Arya_f_ageing_GWAS, P < 1e-08)),
           `p < 1e-08 genomic regions` = nrow(inner_join(Genomic_regions, 
                                                            Arya_f_ageing_GWAS %>% filter(P < 1e-08)))) %>%
      mutate(Study = "Arya et al 2010",
             Treatment = "1",
             Sex = "Female",
             Temperature = "25",
             `Mating status` = "Virgin") %>% 
      dplyr::select(Study, Sex, Temperature,
                    `p < 1e-05 variants`, `p < 1e-05 genomic regions`, 
                    `p < 1e-08 variants`, `p < 1e-08 genomic regions`),
    
    
    tibble(`p < 1e-05 variants` = nrow(Huang_f_18_ageing_GWAS %>% filter(P < 1e-05)),
           `p < 1e-05 genomic regions` = nrow(inner_join(Genomic_regions, 
                                                            Huang_f_18_ageing_GWAS %>% filter(P < 1e-05))),
           `p < 1e-08 variants` = nrow(filter(Huang_f_18_ageing_GWAS, P < 1e-08)),
           `p < 1e-08 genomic regions` = nrow(inner_join(Genomic_regions, 
                                                            Huang_f_18_ageing_GWAS %>% filter(P < 1e-08)))) %>% 
      mutate(Study = "Huang et al 2020",
             Treatment = "1",
             Sex = "Female",
             Temperature = "18",
             `Mating status` = "Mated") %>% 
      dplyr::select(Study, Sex, Temperature,
                    `p < 1e-05 variants`, `p < 1e-05 genomic regions`, 
                    `p < 1e-08 variants`, `p < 1e-08 genomic regions`),
    
    
    tibble(`p < 1e-05 variants` = nrow(Huang_f_25_ageing_GWAS %>% filter(P < 1e-05)),
           `p < 1e-05 genomic regions` = nrow(inner_join(Genomic_regions, 
                                                            Huang_f_25_ageing_GWAS %>% filter(P < 1e-05))),
           `p < 1e-08 variants` = nrow(filter(Huang_f_25_ageing_GWAS, P < 1e-08)),
           `p < 1e-08 genomic regions` = nrow(inner_join(Genomic_regions, 
                                                            Huang_f_25_ageing_GWAS %>% filter(P < 1e-08)))) %>%
      mutate(Study = "Huang et al 2020",
             Treatment = "1",
             Sex = "Female",
             Temperature = "25",
             `Mating status` = "Mated") %>% 
      dplyr::select(Study, Sex, Temperature,
                    `p < 1e-05 variants`, `p < 1e-05 genomic regions`, 
                    `p < 1e-08 variants`, `p < 1e-08 genomic regions`),
    
    tibble(`p < 1e-05 variants` = nrow(Huang_f_28_ageing_GWAS %>% filter(P < 1e-05)),
           `p < 1e-05 genomic regions` = nrow(inner_join(Genomic_regions, 
                                                            Huang_f_28_ageing_GWAS %>% filter(P < 1e-05))),
           `p < 1e-08 variants` = nrow(filter(Huang_f_28_ageing_GWAS, P < 1e-08)),
           `p < 1e-08 genomic regions` = nrow(inner_join(Genomic_regions, 
                                                            Huang_f_28_ageing_GWAS %>% filter(P < 1e-08)))) %>%
      mutate(Study = "Huang et al 2020",
             Treatment = "1",
             Sex = "Female",
             Temperature = "28",
             `Mating status` = "Mated") %>% 
      dplyr::select(Study, Sex, Temperature,
                    `p < 1e-05 variants`, `p < 1e-05 genomic regions`, 
                    `p < 1e-08 variants`, `p < 1e-08 genomic regions`),
    
    tibble(`p < 1e-05 variants` = nrow(Wilson_f_ageing_1_GWAS %>% filter(P < 1e-05)),
           `p < 1e-05 genomic regions` = nrow(inner_join(Genomic_regions, 
                                                            Wilson_f_ageing_1_GWAS %>% filter(P < 1e-05))),
           `p < 1e-08 variants` = nrow(filter(Wilson_f_ageing_1_GWAS, P < 1e-08)),
           `p < 1e-08 genomic regions` = nrow(inner_join(Genomic_regions, 
                                                            Wilson_f_ageing_1_GWAS %>% filter(P < 1e-08)))) %>%
      mutate(Study = "Wilson et al 2020",
             Treatment = "1",
             Sex = "Female",
             Temperature = "25",
             `Mating status` = "Mated") %>% 
      dplyr::select(Study, Sex, Temperature,
                    `p < 1e-05 variants`, `p < 1e-05 genomic regions`, 
                    `p < 1e-08 variants`, `p < 1e-08 genomic regions`),
    
    tibble(`p < 1e-05 variants` = nrow(Wilson_f_ageing_2_GWAS %>% filter(P < 1e-05)),
           `p < 1e-05 genomic regions` = nrow(inner_join(Genomic_regions, 
                                                            Wilson_f_ageing_2_GWAS %>% filter(P < 1e-05))),
           `p < 1e-08 variants` = nrow(filter(Wilson_f_ageing_2_GWAS, P < 1e-08)),
           `p < 1e-08 genomic regions` = nrow(inner_join(Genomic_regions, 
                                                            Wilson_f_ageing_2_GWAS %>% filter(P < 1e-08)))) %>% 
      mutate(Study = "Wilson et al 2020*",
             Treatment = "2",
             Sex = "Female",
             Temperature = "25",
             `Mating status` = "Mated") %>% 
      dplyr::select(Study, Sex, Temperature,
                    `p < 1e-05 variants`, `p < 1e-05 genomic regions`, 
                    `p < 1e-08 variants`, `p < 1e-08 genomic regions`),
    
    tibble(`p < 1e-05 variants` = nrow(Durham_f_ageing_GWAS %>% filter(P < 1e-05)),
           `p < 1e-05 genomic regions` = nrow(inner_join(Genomic_regions, 
                                                            Durham_f_ageing_GWAS %>% filter(P < 1e-05))),
           `p < 1e-08 variants` = nrow(filter(Durham_f_ageing_GWAS, P < 1e-08)),
           `p < 1e-08 genomic regions` = nrow(inner_join(Genomic_regions, 
                                                            Durham_f_ageing_GWAS %>% filter(P < 1e-08)))) %>% 
      mutate(Study = "Durham et al 2014",
             Treatment = "1",
             Sex = "Female",
             Temperature = "25",
             `Mating status` = "Mated") %>% 
      dplyr::select(Study, Sex, Temperature,
                    `p < 1e-05 variants`, `p < 1e-05 genomic regions`, 
                    `p < 1e-08 variants`, `p < 1e-08 genomic regions`),
    
    
    tibble(`p < 1e-05 variants` = nrow(Patel_f_ageing_GWAS %>% filter(P < 1e-05)),
           `p < 1e-05 genomic regions` = nrow(inner_join(Genomic_regions, 
                                                            Patel_f_ageing_GWAS %>% filter(P < 1e-05))),
           `p < 1e-08 variants` = nrow(filter(Patel_f_ageing_GWAS, P < 1e-08)),
           `p < 1e-08 genomic regions` = nrow(inner_join(Genomic_regions, 
                                                            Patel_f_ageing_GWAS %>% filter(P < 1e-08)))) %>%
      mutate(Study = "Patel et al 2021",
             Treatment = "1",
             Sex = "Female",
             Temperature = "23",
             `Mating status` = "Mated") %>% 
      dplyr::select(Study, Sex, Temperature,
                    `p < 1e-05 variants`, `p < 1e-05 genomic regions`, 
                    `p < 1e-08 variants`, `p < 1e-08 genomic regions`),
    
    
    tibble(`p < 1e-05 variants` = nrow(Arya_m_ageing_GWAS %>% filter(P < 1e-05)),
           `p < 1e-05 genomic regions` = nrow(inner_join(Genomic_regions, 
                                                            Arya_m_ageing_GWAS %>% filter(P < 1e-05))),
           `p < 1e-08 variants` = nrow(filter(Arya_m_ageing_GWAS, P < 1e-08)),
           `p < 1e-08 genomic regions` = nrow(inner_join(Genomic_regions, 
                                                            Arya_m_ageing_GWAS %>% filter(P < 1e-08)))) %>%
      mutate(Study = "Arya et al 2010",
             Treatment = "1",
             Sex = "Male",
             Temperature = "25",
             `Mating status` = "Virgin") %>% 
      dplyr::select(Study, Sex, Temperature,
                    `p < 1e-05 variants`, `p < 1e-05 genomic regions`, 
                    `p < 1e-08 variants`, `p < 1e-08 genomic regions`),
    
    tibble(`p < 1e-05 variants` = nrow(Huang_m_18_ageing_GWAS %>% filter(P < 1e-05)),
           `p < 1e-05 genomic regions` = nrow(inner_join(Genomic_regions, 
                                                            Huang_m_18_ageing_GWAS %>% filter(P < 1e-05))),
           `p < 1e-08 variants` = nrow(filter(Huang_m_18_ageing_GWAS, P < 1e-08)),
           `p < 1e-08 genomic regions` = nrow(inner_join(Genomic_regions, 
                                                            Huang_m_18_ageing_GWAS %>% filter(P < 1e-08)))) %>%
      mutate(Study = "Huang et al 2020",
             Treatment = "1",
             Sex = "Male",
             Temperature = "18",
             `Mating status` = "Mated") %>% 
      dplyr::select(Study, Sex, Temperature,
                    `p < 1e-05 variants`, `p < 1e-05 genomic regions`, 
                    `p < 1e-08 variants`, `p < 1e-08 genomic regions`),
    
    
    tibble(`p < 1e-05 variants` = nrow(Huang_m_25_ageing_GWAS %>% filter(P < 1e-05)),
           `p < 1e-05 genomic regions` = nrow(inner_join(Genomic_regions, 
                                                            Huang_m_25_ageing_GWAS %>% filter(P < 1e-05))),
           `p < 1e-08 variants` = nrow(filter(Huang_m_25_ageing_GWAS, P < 1e-08)),
           `p < 1e-08 genomic regions` = nrow(inner_join(Genomic_regions, 
                                                            Huang_m_25_ageing_GWAS %>% filter(P < 1e-08)))) %>%
      mutate(Study = "Huang et al 2020",
             Treatment = "1",
             Sex = "Male",
             Temperature = "25",
             `Mating status` = "Mated") %>% 
      dplyr::select(Study, Sex, Temperature,
                    `p < 1e-05 variants`, `p < 1e-05 genomic regions`, 
                    `p < 1e-08 variants`, `p < 1e-08 genomic regions`),
    
    tibble(`p < 1e-05 variants` = nrow(Huang_m_28_ageing_GWAS %>% filter(P < 1e-05)),
           `p < 1e-05 genomic regions` = nrow(inner_join(Genomic_regions, 
                                                            Huang_m_28_ageing_GWAS %>% filter(P < 1e-05))),
           `p < 1e-08 variants` = nrow(filter(Huang_m_28_ageing_GWAS, P < 1e-08)),
           `p < 1e-08 genomic regions` = nrow(inner_join(Genomic_regions, 
                                                            Huang_m_28_ageing_GWAS %>% filter(P < 1e-08)))) %>%
      mutate(Study = "Huang et al 2020",
             Treatment = "1",
             Sex = "Male",
             Temperature = "28",
             `Mating status` = "Mated") %>% 
      dplyr::select(Study, Sex, Temperature,
                    `p < 1e-05 variants`, `p < 1e-05 genomic regions`, 
                    `p < 1e-08 variants`, `p < 1e-08 genomic regions`),
  ) 

# how many unique variants have been detected?
ageing_p_05_SNPs <-
  bind_rows(
    
    Arya_f_ageing_GWAS %>% 
      filter(P < 1e-05),
    
    Arya_m_ageing_GWAS %>% 
      filter(P < 1e-05),
    
    Huang_f_18_ageing_GWAS %>% 
      filter(P < 1e-05),
    
    Huang_f_25_ageing_GWAS %>% 
      filter(P < 1e-05),
    
    Huang_f_28_ageing_GWAS %>% 
      filter(P < 1e-05),
    
    Huang_m_18_ageing_GWAS %>% 
      filter(P < 1e-05),
    
    Huang_m_25_ageing_GWAS %>% 
      filter(P < 1e-05),
    
    Huang_m_28_ageing_GWAS %>% 
      filter(P < 1e-05),
    
    Wilson_f_ageing_1_GWAS %>% 
      filter(P < 1e-05),
    
    Wilson_f_ageing_2_GWAS %>% 
      filter(P < 1e-05),
    
    Durham_f_ageing_GWAS %>% 
      filter(P < 1e-05),
    
    Patel_f_ageing_GWAS %>% 
      filter(P < 1e-05)
  ) %>% 
  distinct(SNP) %>% 
  left_join(Genomic_regions %>% mutate(Pruned_variant = "YES")) 

ageing_table %>% 
  add_row(Study = "Totals",
          Sex = "",
          Temperature = "",
          `p < 1e-05 variants` = nrow(ageing_p_05_SNPs),
          `p < 1e-05 genomic regions` = nrow(ageing_p_05_SNPs %>% filter(Pruned_variant == "YES")),
          `p < 1e-08 variants` = sum(ageing_table$`p < 1e-08 variants`),
          `p < 1e-08 genomic regions` = sum(ageing_table$`p < 1e-08 genomic regions`)) %>% 
  kable() %>% 
  kable_styling()
Study Sex Temperature p < 1e-05 variants p < 1e-05 genomic regions p < 1e-08 variants p < 1e-08 genomic regions
Arya et al 2010 Female 25 19 1 0 0
Huang et al 2020 Female 18 19 3 1 0
Huang et al 2020 Female 25 34 8 0 0
Huang et al 2020 Female 28 58 6 0 0
Wilson et al 2020 Female 25 33 7 0 0
Wilson et al 2020* Female 25 30 5 0 0
Durham et al 2014 Female 25 25 3 0 0
Patel et al 2021 Female 23 12 3 0 0
Arya et al 2010 Male 25 6 2 0 0
Huang et al 2020 Male 18 5 1 0 0
Huang et al 2020 Male 25 53 5 0 0
Huang et al 2020 Male 28 80 6 0 0
Totals 351 49 1 0

Table SX. Genotype to phenotype associations detected by univariate GWAS, for baseline mortality rate. The number of genomic regions indicates the number of genetic variants associated with baseline mortality after LD pruning. The total row shows the number of unique candidate variants identified across all studies. *Wilson et al. phenotyped lifespan under two separate dietary conditions, which we include separately in our analysis.

Show the code
# filter down to sig associations
scaling_table <-
  bind_rows(
    tibble(`p < 1e-05 variants` = nrow(Arya_f_baseline_mortality_GWAS %>% filter(P < 1e-05)),
           `p < 1e-05 genomic regions` = nrow(inner_join(Genomic_regions, 
                                                            Arya_f_baseline_mortality_GWAS %>% filter(P < 1e-05))),
           `p < 1e-08 variants` = nrow(filter(Arya_f_baseline_mortality_GWAS, P < 1e-08)),
           `p < 1e-08 genomic regions` = nrow(inner_join(Genomic_regions, 
                                                            Arya_f_baseline_mortality_GWAS %>% filter(P < 1e-08)))) %>%
      mutate(Study = "Arya et al 2010",
             Treatment = "1",
             Sex = "Female",
             Temperature = "25",
             `Mating status` = "Virgin") %>% 
      dplyr::select(Study, Sex, Temperature,
                    `p < 1e-05 variants`, `p < 1e-05 genomic regions`, 
                    `p < 1e-08 variants`, `p < 1e-08 genomic regions`),
    
    
    tibble(`p < 1e-05 variants` = nrow(Huang_f_18_baseline_mortality_GWAS %>% filter(P < 1e-05)),
           `p < 1e-05 genomic regions` = nrow(inner_join(Genomic_regions, 
                                                            Huang_f_18_baseline_mortality_GWAS %>% filter(P < 1e-05))),
           `p < 1e-08 variants` = nrow(filter(Huang_f_18_baseline_mortality_GWAS, P < 1e-08)),
           `p < 1e-08 genomic regions` = nrow(inner_join(Genomic_regions, 
                                                            Huang_f_18_baseline_mortality_GWAS %>% filter(P < 1e-08)))) %>% 
      mutate(Study = "Huang et al 2020",
             Treatment = "1",
             Sex = "Female",
             Temperature = "18",
             `Mating status` = "Mated") %>% 
      dplyr::select(Study, Sex, Temperature,
                    `p < 1e-05 variants`, `p < 1e-05 genomic regions`, 
                    `p < 1e-08 variants`, `p < 1e-08 genomic regions`),
    
    
    tibble(`p < 1e-05 variants` = nrow(Huang_f_25_baseline_mortality_GWAS %>% filter(P < 1e-05)),
           `p < 1e-05 genomic regions` = nrow(inner_join(Genomic_regions, 
                                                            Huang_f_25_baseline_mortality_GWAS %>% filter(P < 1e-05))),
           `p < 1e-08 variants` = nrow(filter(Huang_f_25_baseline_mortality_GWAS, P < 1e-08)),
           `p < 1e-08 genomic regions` = nrow(inner_join(Genomic_regions, 
                                                            Huang_f_25_baseline_mortality_GWAS %>% filter(P < 1e-08)))) %>%
      mutate(Study = "Huang et al 2020",
             Treatment = "1",
             Sex = "Female",
             Temperature = "25",
             `Mating status` = "Mated") %>% 
      dplyr::select(Study, Sex, Temperature,
                    `p < 1e-05 variants`, `p < 1e-05 genomic regions`, 
                    `p < 1e-08 variants`, `p < 1e-08 genomic regions`),
    
    tibble(`p < 1e-05 variants` = nrow(Huang_f_28_baseline_mortality_GWAS %>% filter(P < 1e-05)),
           `p < 1e-05 genomic regions` = nrow(inner_join(Genomic_regions, 
                                                            Huang_f_28_baseline_mortality_GWAS %>% filter(P < 1e-05))),
           `p < 1e-08 variants` = nrow(filter(Huang_f_28_baseline_mortality_GWAS, P < 1e-08)),
           `p < 1e-08 genomic regions` = nrow(inner_join(Genomic_regions, 
                                                            Huang_f_28_baseline_mortality_GWAS %>% filter(P < 1e-08)))) %>%
      mutate(Study = "Huang et al 2020",
             Treatment = "1",
             Sex = "Female",
             Temperature = "28",
             `Mating status` = "Mated") %>% 
      dplyr::select(Study, Sex, Temperature,
                    `p < 1e-05 variants`, `p < 1e-05 genomic regions`, 
                    `p < 1e-08 variants`, `p < 1e-08 genomic regions`),
    
    tibble(`p < 1e-05 variants` = nrow(Wilson_f_baseline_mortality_1_GWAS %>% filter(P < 1e-05)),
           `p < 1e-05 genomic regions` = nrow(inner_join(Genomic_regions, 
                                                            Wilson_f_baseline_mortality_1_GWAS %>% filter(P < 1e-05))),
           `p < 1e-08 variants` = nrow(filter(Wilson_f_baseline_mortality_1_GWAS, P < 1e-08)),
           `p < 1e-08 genomic regions` = nrow(inner_join(Genomic_regions, 
                                                            Wilson_f_baseline_mortality_1_GWAS %>% filter(P < 1e-08)))) %>%
      mutate(Study = "Wilson et al 2020",
             Treatment = "1",
             Sex = "Female",
             Temperature = "25",
             `Mating status` = "Mated") %>% 
      dplyr::select(Study, Sex, Temperature,
                    `p < 1e-05 variants`, `p < 1e-05 genomic regions`, 
                    `p < 1e-08 variants`, `p < 1e-08 genomic regions`),
    
    tibble(`p < 1e-05 variants` = nrow(Wilson_f_baseline_mortality_2_GWAS %>% filter(P < 1e-05)),
           `p < 1e-05 genomic regions` = nrow(inner_join(Genomic_regions, 
                                                            Wilson_f_baseline_mortality_2_GWAS %>% filter(P < 1e-05))),
           `p < 1e-08 variants` = nrow(filter(Wilson_f_baseline_mortality_2_GWAS, P < 1e-08)),
           `p < 1e-08 genomic regions` = nrow(inner_join(Genomic_regions, 
                                                            Wilson_f_baseline_mortality_2_GWAS %>% filter(P < 1e-08)))) %>% 
      mutate(Study = "Wilson et al 2020*",
             Treatment = "2",
             Sex = "Female",
             Temperature = "25",
             `Mating status` = "Mated") %>% 
      dplyr::select(Study, Sex, Temperature,
                    `p < 1e-05 variants`, `p < 1e-05 genomic regions`, 
                    `p < 1e-08 variants`, `p < 1e-08 genomic regions`),
    
    tibble(`p < 1e-05 variants` = nrow(Durham_f_baseline_mortality_GWAS %>% filter(P < 1e-05)),
           `p < 1e-05 genomic regions` = nrow(inner_join(Genomic_regions, 
                                                            Durham_f_baseline_mortality_GWAS %>% filter(P < 1e-05))),
           `p < 1e-08 variants` = nrow(filter(Durham_f_baseline_mortality_GWAS, P < 1e-08)),
           `p < 1e-08 genomic regions` = nrow(inner_join(Genomic_regions, 
                                                            Durham_f_baseline_mortality_GWAS %>% filter(P < 1e-08)))) %>% 
      mutate(Study = "Durham et al 2014",
             Treatment = "1",
             Sex = "Female",
             Temperature = "25",
             `Mating status` = "Mated") %>% 
      dplyr::select(Study, Sex, Temperature,
                    `p < 1e-05 variants`, `p < 1e-05 genomic regions`, 
                    `p < 1e-08 variants`, `p < 1e-08 genomic regions`),
    
    
    tibble(`p < 1e-05 variants` = nrow(Patel_f_baseline_mortality_GWAS %>% filter(P < 1e-05)),
           `p < 1e-05 genomic regions` = nrow(inner_join(Genomic_regions, 
                                                            Patel_f_baseline_mortality_GWAS %>% filter(P < 1e-05))),
           `p < 1e-08 variants` = nrow(filter(Patel_f_baseline_mortality_GWAS, P < 1e-08)),
           `p < 1e-08 genomic regions` = nrow(inner_join(Genomic_regions, 
                                                            Patel_f_baseline_mortality_GWAS %>% filter(P < 1e-08)))) %>%
      mutate(Study = "Patel et al 2021",
             Treatment = "1",
             Sex = "Female",
             Temperature = "23",
             `Mating status` = "Mated") %>% 
      dplyr::select(Study, Sex, Temperature,
                    `p < 1e-05 variants`, `p < 1e-05 genomic regions`, 
                    `p < 1e-08 variants`, `p < 1e-08 genomic regions`),
    
    
    tibble(`p < 1e-05 variants` = nrow(Arya_m_baseline_mortality_GWAS %>% filter(P < 1e-05)),
           `p < 1e-05 genomic regions` = nrow(inner_join(Genomic_regions, 
                                                            Arya_m_baseline_mortality_GWAS %>% filter(P < 1e-05))),
           `p < 1e-08 variants` = nrow(filter(Arya_m_baseline_mortality_GWAS, P < 1e-08)),
           `p < 1e-08 genomic regions` = nrow(inner_join(Genomic_regions, 
                                                            Arya_m_baseline_mortality_GWAS %>% filter(P < 1e-08)))) %>%
      mutate(Study = "Arya et al 2010",
             Treatment = "1",
             Sex = "Male",
             Temperature = "25",
             `Mating status` = "Virgin") %>% 
      dplyr::select(Study, Sex, Temperature,
                    `p < 1e-05 variants`, `p < 1e-05 genomic regions`, 
                    `p < 1e-08 variants`, `p < 1e-08 genomic regions`),
    
    tibble(`p < 1e-05 variants` = nrow(Huang_m_18_baseline_mortality_GWAS %>% filter(P < 1e-05)),
           `p < 1e-05 genomic regions` = nrow(inner_join(Genomic_regions, 
                                                            Huang_m_18_baseline_mortality_GWAS %>% filter(P < 1e-05))),
           `p < 1e-08 variants` = nrow(filter(Huang_m_18_baseline_mortality_GWAS, P < 1e-08)),
           `p < 1e-08 genomic regions` = nrow(inner_join(Genomic_regions, 
                                                            Huang_m_18_baseline_mortality_GWAS %>% filter(P < 1e-08)))) %>%
      mutate(Study = "Huang et al 2020",
             Treatment = "1",
             Sex = "Male",
             Temperature = "18",
             `Mating status` = "Mated") %>% 
      dplyr::select(Study, Sex, Temperature,
                    `p < 1e-05 variants`, `p < 1e-05 genomic regions`, 
                    `p < 1e-08 variants`, `p < 1e-08 genomic regions`),
    
    
    tibble(`p < 1e-05 variants` = nrow(Huang_m_25_baseline_mortality_GWAS %>% filter(P < 1e-05)),
           `p < 1e-05 genomic regions` = nrow(inner_join(Genomic_regions, 
                                                            Huang_m_25_baseline_mortality_GWAS %>% filter(P < 1e-05))),
           `p < 1e-08 variants` = nrow(filter(Huang_m_25_baseline_mortality_GWAS, P < 1e-08)),
           `p < 1e-08 genomic regions` = nrow(inner_join(Genomic_regions, 
                                                            Huang_m_25_baseline_mortality_GWAS %>% filter(P < 1e-08)))) %>%
      mutate(Study = "Huang et al 2020",
             Treatment = "1",
             Sex = "Male",
             Temperature = "25",
             `Mating status` = "Mated") %>% 
      dplyr::select(Study, Sex, Temperature,
                    `p < 1e-05 variants`, `p < 1e-05 genomic regions`, 
                    `p < 1e-08 variants`, `p < 1e-08 genomic regions`),
    
    tibble(`p < 1e-05 variants` = nrow(Huang_m_28_baseline_mortality_GWAS %>% filter(P < 1e-05)),
           `p < 1e-05 genomic regions` = nrow(inner_join(Genomic_regions, 
                                                            Huang_m_28_baseline_mortality_GWAS %>% filter(P < 1e-05))),
           `p < 1e-08 variants` = nrow(filter(Huang_m_28_baseline_mortality_GWAS, P < 1e-08)),
           `p < 1e-08 genomic regions` = nrow(inner_join(Genomic_regions, 
                                                            Huang_m_28_baseline_mortality_GWAS %>% filter(P < 1e-08)))) %>%
      mutate(Study = "Huang et al 2020",
             Treatment = "1",
             Sex = "Male",
             Temperature = "28",
             `Mating status` = "Mated") %>% 
      dplyr::select(Study, Sex, Temperature,
                    `p < 1e-05 variants`, `p < 1e-05 genomic regions`, 
                    `p < 1e-08 variants`, `p < 1e-08 genomic regions`),
  ) 

# how many unique variants have been detected?
scaling_p_05_SNPs <-
  bind_rows(
    
    Arya_f_baseline_mortality_GWAS %>% 
      filter(P < 1e-05),
    
    Arya_m_baseline_mortality_GWAS %>% 
      filter(P < 1e-05),
    
    Huang_f_18_baseline_mortality_GWAS %>% 
      filter(P < 1e-05),
    
    Huang_f_25_baseline_mortality_GWAS %>% 
      filter(P < 1e-05),
    
    Huang_f_28_baseline_mortality_GWAS %>% 
      filter(P < 1e-05),
    
    Huang_m_18_baseline_mortality_GWAS %>% 
      filter(P < 1e-05),
    
    Huang_m_25_baseline_mortality_GWAS %>% 
      filter(P < 1e-05),
    
    Huang_m_28_baseline_mortality_GWAS %>% 
      filter(P < 1e-05),
    
    Wilson_f_baseline_mortality_1_GWAS %>% 
      filter(P < 1e-05),
    
    Wilson_f_baseline_mortality_2_GWAS %>% 
      filter(P < 1e-05),
    
    Durham_f_baseline_mortality_GWAS %>% 
      filter(P < 1e-05),
    
    Patel_f_baseline_mortality_GWAS %>% 
      filter(P < 1e-05)
  ) %>% 
  distinct(SNP) %>% 
  left_join(Genomic_regions %>% mutate(Pruned_variant = "YES")) 

scaling_table %>% 
  add_row(Study = "Totals",
          Sex = "",
          Temperature = "",
          `p < 1e-05 variants` = nrow(scaling_p_05_SNPs),
          `p < 1e-05 genomic regions` = nrow(scaling_p_05_SNPs %>% filter(Pruned_variant == "YES")),
          `p < 1e-08 variants` = sum(scaling_table$`p < 1e-08 variants`),
          `p < 1e-08 genomic regions` = sum(scaling_table$`p < 1e-08 genomic regions`)) %>% 
  kable() %>% 
  kable_styling()
Study Sex Temperature p < 1e-05 variants p < 1e-05 genomic regions p < 1e-08 variants p < 1e-08 genomic regions
Arya et al 2010 Female 25 29 5 0 0
Huang et al 2020 Female 18 14 4 0 0
Huang et al 2020 Female 25 43 7 0 0
Huang et al 2020 Female 28 35 0 0 0
Wilson et al 2020 Female 25 22 4 0 0
Wilson et al 2020* Female 25 10 1 0 0
Durham et al 2014 Female 25 51 5 0 0
Patel et al 2021 Female 23 96 4 0 0
Arya et al 2010 Male 25 12 5 0 0
Huang et al 2020 Male 18 26 2 0 0
Huang et al 2020 Male 25 39 2 0 0
Huang et al 2020 Male 28 22 2 0 0
Totals 390 39 0 0

Applying cross-phenotype meta-analysis

Generate the genetic correlation matrix

Using SNP effect sizes, we calculate the genetic correlations between a) rates of ageing and b) baseline mortality, measured in different environmental contexts.

Show the code
# use the BETA coefficients to build the SNP correlation matrix for the rate of ageing

SNP_ageing_axis_data <-
  bind_rows(
    Arya_f_ageing_GWAS %>% 
      mutate(Study = "Arya_2010", Temperature = 25, Sex = "Female"),
    
    Arya_m_ageing_GWAS %>% 
      mutate(Study = "Arya_2010", Temperature = 25, Sex = "Male"),
    
    Huang_f_18_ageing_GWAS %>% 
      mutate(Study = "Huang_2020", Temperature = 18, Sex = "Female"),
    
    Huang_m_18_ageing_GWAS %>% 
      mutate(Study = "Huang_2020", Temperature = 18, Sex = "Male"),
    
    Huang_f_25_ageing_GWAS %>% 
      mutate(Study = "Huang_2020", Temperature = 25, Sex = "Female"),
    
    Huang_m_25_ageing_GWAS %>% 
      mutate(Study = "Huang_2020", Temperature = 25, Sex = "Male"),
    
    Huang_f_28_ageing_GWAS %>% 
      mutate(Study = "Huang_2020", Temperature = 28, Sex = "Female"),
    
    Huang_m_28_ageing_GWAS %>% 
      mutate(Study = "Huang_2020", Temperature = 28, Sex = "Male"),
    
    Wilson_f_ageing_1_GWAS %>% 
      mutate(Study = "Wilson_2020_1", Temperature = 25, Sex = "Female"),
    
    Wilson_f_ageing_2_GWAS %>% 
      mutate(Study = "Wilson_2020_2", Temperature = 25, Sex = "Female"),
    
    Durham_f_ageing_GWAS %>% 
      mutate(Study = "Durham_2014", Temperature = 25, Sex = "Female"),
    
    Patel_f_ageing_GWAS %>% 
      mutate(Study = "Patel_2021", Temperature = 23, Sex = "Female")) %>% 
  dplyr::select(SNP, BETA, Study, Temperature, Sex) %>% 
  pivot_wider(values_from = BETA, names_from = c(Study, Temperature, Sex)) 

SNP_ageing_axis_LD_pruned <- SNP_ageing_axis_data %>% inner_join(Genomic_regions)

SNP_ageing_axis_corr_matrix <- cor(SNP_ageing_axis_LD_pruned %>% dplyr::select(-SNP), use = "pairwise.complete.obs", method = "spearman")

# use the BETA coefficients to build the SNP correlation matrix for scaling

SNP_baseline_mortality_axis_data <-
 bind_rows(
    Arya_f_baseline_mortality_GWAS %>% 
      mutate(Study = "Arya_2010", Temperature = 25, Sex = "Female"),
    
    Arya_m_baseline_mortality_GWAS %>% 
      mutate(Study = "Arya_2010", Temperature = 25, Sex = "Male"),
  
    Huang_f_18_baseline_mortality_GWAS %>% 
      mutate(Study = "Huang_2020", Temperature = 18, Sex = "Female"),
    
    Huang_m_18_baseline_mortality_GWAS %>% 
      mutate(Study = "Huang_2020", Temperature = 18, Sex = "Male"),
    
    Huang_f_25_baseline_mortality_GWAS %>% 
      mutate(Study = "Huang_2020", Temperature = 25, Sex = "Female"),
    
    Huang_m_25_baseline_mortality_GWAS %>% 
      mutate(Study = "Huang_2020", Temperature = 25, Sex = "Male"),
  
    Huang_f_28_baseline_mortality_GWAS %>% 
      mutate(Study = "Huang_2020", Temperature = 28, Sex = "Female"),
    
    Huang_m_28_baseline_mortality_GWAS %>% 
      mutate(Study = "Huang_2020", Temperature = 28, Sex = "Male"),
    
     Wilson_f_baseline_mortality_1_GWAS %>% 
      mutate(Study = "Wilson_2020_1", Temperature = 25, Sex = "Female"),
    
    Wilson_f_baseline_mortality_2_GWAS %>% 
      mutate(Study = "Wilson_2020_2", Temperature = 25, Sex = "Female"),
    
    Durham_f_baseline_mortality_GWAS %>% 
      mutate(Study = "Durham", Temperature = 25, Sex = "Female"),
  
    Patel_f_baseline_mortality_GWAS %>% 
      mutate(Study = "Patel", Temperature = 23, Sex = "Female")) %>% 
  dplyr::select(SNP, BETA, Study, Temperature, Sex) %>% 
  pivot_wider(values_from = BETA, names_from = c(Study, Temperature, Sex))

SNP_baseline_mortality_axis_LD_pruned <- SNP_baseline_mortality_axis_data %>% inner_join(Genomic_regions)


SNP_baseline_mortality_axis_corr_matrix <- cor(SNP_baseline_mortality_axis_LD_pruned %>% dplyr::select(-SNP), use = "pairwise.complete.obs", method = "spearman")

Calculate meta-analytic test statistics

The purpose of these meta-analyses is to detect SNPs associated with 1) the rate of ageing and 2) baseline mortality rate.

Run CPASSOC for the rate of ageing

Show the code
# rate of ageing

ageing_axis_Arya_f_T <- 
  Arya_f_ageing_GWAS %>% 
  dplyr::select(SNP, T) %>% 
  rename(Arya_f = T)
    
ageing_axis_Arya_m_T <- 
  Arya_m_ageing_GWAS %>% 
  dplyr::select(SNP, T) %>% 
  rename(Arya_m = T)

ageing_axis_Huang_f_18_T <- 
  Huang_f_18_ageing_GWAS %>% 
  dplyr::select(SNP, T) %>% 
  rename(Huang_f_18 = T)
  
ageing_axis_Huang_m_18_T <- 
  Huang_m_18_ageing_GWAS %>% 
  dplyr::select(SNP, T) %>% 
  rename(Huang_m_18 = T)

ageing_axis_Huang_f_25_T <- 
  Huang_f_25_ageing_GWAS %>% 
  dplyr::select(SNP, T) %>% 
  rename(Huang_f_25 = T)
  
ageing_axis_Huang_m_25_T <- 
  Huang_m_25_ageing_GWAS %>% 
  dplyr::select(SNP, T) %>% 
  rename(Huang_m_25 = T)

ageing_axis_Huang_f_28_T <- 
  Huang_f_28_ageing_GWAS %>% 
  dplyr::select(SNP, T) %>% 
  rename(Huang_f_28 = T)
  
ageing_axis_Huang_m_28_T <- 
  Huang_m_28_ageing_GWAS %>% 
  dplyr::select(SNP, T) %>% 
  rename(Huang_m_28 = T)
    
ageing_axis_Wilson_f_1_T <- 
  Wilson_f_ageing_1_GWAS %>% 
  dplyr::select(SNP, T) %>% 
  rename(Wilson_f_1 = T)

ageing_axis_Wilson_f_2_T <- 
  Wilson_f_ageing_2_GWAS %>% 
  dplyr::select(SNP, T) %>% 
  rename(Wilson_f_2 = T)

ageing_axis_Durham_f_T <- 
  Durham_f_ageing_GWAS %>% 
  dplyr::select(SNP, T) %>% 
  rename(Durham_f = T)

ageing_axis_Patel_f_T <- 
  Patel_f_ageing_GWAS %>% 
  dplyr::select(SNP, T) %>% 
  rename(Patel_f = T)
    

ageing_axis_t_stats <-
  ageing_axis_Arya_f_T %>%
  inner_join(ageing_axis_Arya_m_T, by = "SNP") %>%
  inner_join(ageing_axis_Huang_f_18_T, by = "SNP") %>% 
  inner_join(ageing_axis_Huang_m_18_T, by = "SNP") %>% 
  inner_join(ageing_axis_Huang_f_25_T, by = "SNP") %>% 
  inner_join(ageing_axis_Huang_m_25_T, by = "SNP") %>% 
  inner_join(ageing_axis_Huang_f_28_T, by = "SNP") %>% 
  inner_join(ageing_axis_Huang_m_28_T, by = "SNP") %>% 
  inner_join(ageing_axis_Wilson_f_1_T, by = "SNP") %>%
  inner_join(ageing_axis_Wilson_f_2_T, by = "SNP") %>%
  inner_join(ageing_axis_Durham_f_T, by = "SNP") %>%
  inner_join(ageing_axis_Patel_f_T, by = "SNP") 

ageing_axis_t_stat_values <-
  ageing_axis_t_stats %>% 
  dplyr::select(2:13)

Sample_size_ageing_axis <- c(165, 165, 183, 183, 186, 186, 177, 177, 161, 161, 176, 193)

if(!file.exists("data/Derived/GWAS_results/ageing_axis_meta_results.csv")) {

# run the homogeneous effect meta-analysis

S_hom <- SHom(ageing_axis_t_stat_values, Sample_size_ageing_axis, SNP_ageing_axis_corr_matrix)

# calculate meta-p-values and bind the two together with the SNP names

p_S_hom <- pchisq(S_hom, df = 1, ncp = 0, lower.tail = F) %>% 
  as_tibble() %>% 
  bind_cols(S_hom) %>% 
  rename(meta_p_hom = value, 
         S_hom = ...2)

# Calculate S_het, an extension of S_hom that improves power when the genetic effect sizes vary for different traits e.g. if a SNP has a sex or environment opposite effect on lifespan

# estimate parameters of gamma distribution

para <- EstimateGamma(N = 1E4, Sample_size_ageing_axis, SNP_ageing_axis_corr_matrix);

S_het <- SHet(ageing_axis_t_stat_values, Sample_size_ageing_axis, SNP_ageing_axis_corr_matrix)

# obtain P-values of S_Het using the estimated gamma parameters
  
p_S_het <- pgamma(q = S_het-para[3], shape = para[1], scale = para[2], lower.tail = F) %>% 
  as_tibble() %>% 
  bind_cols(S_het) %>% 
  rename(meta_p_het = value, 
         S_het = ...2)


ageing_axis_meta_results <- 
  ageing_axis_t_stats %>% 
  bind_cols(p_S_hom,
            p_S_het) # add the unadjusted p values

write_csv(ageing_axis_meta_results, "data/Derived/GWAS_results/ageing_axis_meta_results.csv")

} else ageing_axis_meta_results <- read_csv("data/Derived/GWAS_results/ageing_axis_meta_results.csv")

Run CPASSOC for the baseline rate of mortality

Show the code
baseline_mortality_axis_Arya_f_T <- 
  Arya_f_baseline_mortality_GWAS %>% 
  dplyr::select(SNP, T) %>% 
  rename(Arya_f = T)
    
baseline_mortality_axis_Arya_m_T <- 
  Arya_m_baseline_mortality_GWAS %>% 
  dplyr::select(SNP, T) %>% 
  rename(Arya_m = T)

baseline_mortality_axis_Huang_f_18_T <- 
  Huang_f_18_baseline_mortality_GWAS %>% 
  dplyr::select(SNP, T) %>% 
  rename(Huang_f_18 = T)
  
baseline_mortality_axis_Huang_m_18_T <- 
  Huang_m_18_baseline_mortality_GWAS %>% 
  dplyr::select(SNP, T) %>% 
  rename(Huang_m_18 = T)

baseline_mortality_axis_Huang_f_25_T <- 
  Huang_f_25_baseline_mortality_GWAS %>% 
  dplyr::select(SNP, T) %>% 
  rename(Huang_f_25 = T)
  
baseline_mortality_axis_Huang_m_25_T <- 
  Huang_m_25_baseline_mortality_GWAS %>% 
  dplyr::select(SNP, T) %>% 
  rename(Huang_m_25 = T)

baseline_mortality_axis_Huang_f_28_T <- 
  Huang_f_28_baseline_mortality_GWAS %>% 
  dplyr::select(SNP, T) %>% 
  rename(Huang_f_28 = T)
  
baseline_mortality_axis_Huang_m_28_T <- 
  Huang_m_28_baseline_mortality_GWAS %>% 
  dplyr::select(SNP, T) %>% 
  rename(Huang_m_28 = T)
    
baseline_mortality_axis_Wilson_f_1_T <- 
  Wilson_f_baseline_mortality_1_GWAS %>% 
  dplyr::select(SNP, T) %>% 
  rename(Wilson_f_1 = T)

baseline_mortality_axis_Wilson_f_2_T <- 
  Wilson_f_baseline_mortality_2_GWAS %>% 
  dplyr::select(SNP, T) %>% 
  rename(Wilson_f_2 = T)

baseline_mortality_axis_Durham_f_T <- 
  Durham_f_baseline_mortality_GWAS %>% 
  dplyr::select(SNP, T) %>% 
  rename(Durham_f = T)

baseline_mortality_axis_Patel_f_T <- 
  Patel_f_baseline_mortality_GWAS %>% 
  dplyr::select(SNP, T) %>% 
  rename(Patel_f = T)
    

baseline_mortality_axis_t_stats <-
  baseline_mortality_axis_Arya_f_T %>%
  inner_join(baseline_mortality_axis_Arya_m_T, by = "SNP") %>%
  inner_join(baseline_mortality_axis_Huang_f_18_T, by = "SNP") %>% 
  inner_join(baseline_mortality_axis_Huang_m_18_T, by = "SNP") %>% 
  inner_join(baseline_mortality_axis_Huang_f_25_T, by = "SNP") %>% 
  inner_join(baseline_mortality_axis_Huang_m_25_T, by = "SNP") %>% 
  inner_join(baseline_mortality_axis_Huang_f_28_T, by = "SNP") %>% 
  inner_join(baseline_mortality_axis_Huang_m_28_T, by = "SNP") %>% 
  inner_join(baseline_mortality_axis_Wilson_f_1_T, by = "SNP") %>%
  inner_join(baseline_mortality_axis_Wilson_f_2_T, by = "SNP") %>%
  inner_join(baseline_mortality_axis_Durham_f_T, by = "SNP") %>%
  inner_join(baseline_mortality_axis_Patel_f_T, by = "SNP") 


baseline_mortality_axis_t_stat_values <-
  baseline_mortality_axis_t_stats %>% 
  dplyr::select(2:13)

Sample_size_baseline_mortality_axis <- c(165, 165, 183, 183, 186, 186, 177, 177, 161, 161, 176, 193)

if(!file.exists("data/Derived/GWAS_results/baseline_mortality_axis_meta_results.csv")) {

# run the homogeneous effect meta-analysis

S_hom <- SHom(baseline_mortality_axis_t_stat_values, Sample_size_baseline_mortality_axis, SNP_baseline_mortality_axis_corr_matrix)

# calculate meta-p-values and bind the two together with the SNP names

p_S_hom <- pchisq(S_hom, df = 1, ncp = 0, lower.tail = F) %>% 
  as_tibble() %>% 
  bind_cols(S_hom) %>% 
  rename(meta_p_hom = value, 
         S_hom = ...2)

# Calculate S_het, an extension of S_hom that improves power when the genetic effect sizes vary for different traits (e.g. if a SNP has a sex or enviornment opposite effect on lifespan)

# estimate parameters of gamma distribution

para <- EstimateGamma(N = 1E4, Sample_size_baseline_mortality_axis, SNP_baseline_mortality_axis_corr_matrix);

S_het <- SHet(baseline_mortality_axis_t_stat_values, Sample_size_baseline_mortality_axis, SNP_baseline_mortality_axis_corr_matrix)

# obtain P-values of S_Het using the estimated gamma parameters
  
p_S_het <- pgamma(q = S_het-para[3], shape = para[1], scale = para[2], lower.tail = F) %>% 
  as_tibble() %>% 
  bind_cols(S_het) %>% 
  rename(meta_p_het = value, 
         S_het = ...2)


baseline_mortality_axis_meta_results <- 
  baseline_mortality_axis_t_stats %>% 
  bind_cols(p_S_hom,
            p_S_het) # add the unadjusted p values

write_csv(baseline_mortality_axis_meta_results, "data/Derived/GWAS_results/baseline_mortality_axis_meta_results.csv")

} else baseline_mortality_axis_meta_results <- read_csv("data/Derived/GWAS_results/baseline_mortality_axis_meta_results.csv")

Visualise the results

We combine GWAS \(T\) statistics calculated for the rate of ageing and baseline mortality measured across different contexts. It’s possible that some SNPs have G x E interactions that lead to a heterogeneous effect across phenotypes. We therefore utilise the S_het calculated p-values.

First lets show the effect of CPASSOC on the number of variants found to be associated with the rate of ageing and the scaling of mortality risk.

Table SX. the number of variants associated with ageing rate and baseline mortality at various significance thresholds, estimated by univariate GWAS or CPASSOC. The number of genomic regions indicates the number of variants detected after LD pruning.

Show the code
tibble(Analysis = c("CPASSOC", "Univariate", "CPASSOC", "Univariate"),
       Trait = c("Ageing rate", "Ageing rate", "Scaling", "Scaling"),
       `p < 1e-05 variants` = c(sum(ageing_axis_meta_results$meta_p_het < 1e-05),
                                nrow(ageing_p_05_SNPs),
                                sum(baseline_mortality_axis_meta_results$meta_p_het < 1e-05),
                                nrow(scaling_p_05_SNPs)),
       `p < 1e-05 genomic regions` = c(nrow(ageing_axis_meta_results %>%
                                                 filter(meta_p_het < 1e-05) %>%
                                                 inner_join(Genomic_regions)),
                                          nrow(ageing_p_05_SNPs %>% filter(Pruned_variant == "YES")),
                                          nrow(baseline_mortality_axis_meta_results %>%
                                                 filter(meta_p_het < 1e-05) %>%
                                                 inner_join(Genomic_regions)),
                                          nrow(scaling_p_05_SNPs %>% filter(Pruned_variant == "YES"))),
       `p < 1e-08 variants` = c(sum(ageing_axis_meta_results$meta_p_het < 1e-08),
                                sum(ageing_table$`p < 1e-08 variants`),
                                sum(baseline_mortality_axis_meta_results$meta_p_het < 1e-08),
                                sum(scaling_table$`p < 1e-08 variants`)),
       `p < 1e-08 genomic regions` = c(nrow(ageing_axis_meta_results %>% 
                                                 filter(meta_p_het < 1e-08) %>% 
                                                 inner_join(Genomic_regions)),
                                          sum(ageing_table$`p < 1e-08 genomic regions`),
                                          nrow(baseline_mortality_axis_meta_results %>%
                                                 filter(meta_p_het < 1e-08) %>%
                                                 inner_join(Genomic_regions)),
                                          sum(scaling_table$`p < 1e-08 genomic regions`)))  %>% 
  kable() %>% 
  kable_styling()
Analysis Trait p < 1e-05 variants p < 1e-05 genomic regions p < 1e-08 variants p < 1e-08 genomic regions
CPASSOC Ageing rate 219 118 28 21
Univariate Ageing rate 351 49 1 0
CPASSOC Scaling 456 243 116 54
Univariate Scaling 390 39 0 0

Table SX. genes that encompass or are very close to the genetic variants that have associations with the rate of ageing.

Show the code
# join gene annotations with the list of analysed variants 
# note that some SNPs are associated with >1 gene, because the gene annotations overlap (I think) or the variant is close to multiple annotated genes. Others are not near any known genes, producing NAs.

ageing_rate_genes <-
  ageing_axis_meta_results %>%
  filter(meta_p_het < 1e-08) %>% 
  dplyr::select(SNP, S_het, meta_p_het) %>%
  left_join(annotations %>% filter(distance.to.gene <= 500)) %>% 
  mutate(meta_p_het = signif(meta_p_het*10^9, 3)/10^9,
         S_het = round(S_het, 3)) %>% 
  dplyr::select(SNP, S_het, meta_p_het, FBID, gene_name, site.class, distance.to.gene)

ageing_rate_genes %>% 
  my_data_table()

Table SX. genes that encompass or are very close to the genetic variants that have associations with baseline mortality rate.

Show the code
scaling_genes <-
  baseline_mortality_axis_meta_results %>% 
  filter(meta_p_het < 1e-08) %>% 
  dplyr::select(SNP, S_het, meta_p_het) %>%
  left_join(annotations %>% filter(distance.to.gene <= 500)) %>% 
  mutate(meta_p_het = signif(meta_p_het*10^10, 3)/10^10,
         S_het = round(S_het, 3)) %>% 
  dplyr::select(SNP, S_het, meta_p_het, FBID, gene_name, site.class, distance.to.gene)

scaling_genes %>% 
  my_data_table()

Now lets build some ‘Manhattan plots’ to show where these significant associations can be found:

Show the code
ageing_axis_results <- 
  ageing_axis_meta_results %>% 
  inner_join(Genomic_regions) %>% 
  dplyr::select(SNP, meta_p_hom, meta_p_het) %>% 
  rename(P = meta_p_het) %>% 
  mutate(logp = -log10(P))

baseline_mortality_axis_results <- 
  baseline_mortality_axis_meta_results %>% 
  inner_join(Genomic_regions) %>% 
  dplyr::select(SNP, meta_p_hom, meta_p_het) %>% 
  rename(P = meta_p_het) %>% 
  mutate(logp = -log10(P))

# plot the results using the manhattan plot custom function we defined earlier

ageing_axis_het_plot <- 
  build_manhattan_plot(ageing_axis_results) +
  labs(title = "Ageing rate") +
  theme(plot.title = element_text(size = 20, hjust = 0.5)) +
  scale_y_continuous(limits = c(0, 19), expand = c(0, 0))

baseline_mortality_axis_het_plot <- 
  build_manhattan_plot(baseline_mortality_axis_results) +
  labs(title = "Baseline mortality") +
  theme(plot.title = element_text(size = 20, hjust = 0.5)) +
  scale_y_continuous(limits = c(0, 19), expand = c(0, 0))

baseline_mortality_axis_het_plot + ageing_axis_het_plot  

Figure XX. Manhattan plots showing the -Log10 p-value for each locus’ effect on baseline mortality and the rate of ageing.

Plot the univariate effect sizes for each of the genomic regions associated with the rate of ageing at the genome-wide significance threshold (p < \(10^{-8}\)) after CPASSOC.

Show the code
SNP_heatmap_ageing_axis <-
  SNP_ageing_axis_data %>% 
  inner_join(
    ageing_axis_meta_results %>% 
      filter(meta_p_het < 1e-08) %>% 
      dplyr::select(SNP) %>% 
      inner_join(Genomic_regions))

row_name <- SNP_heatmap_ageing_axis$SNP
SNP_heatmap_ageing_axis <- SNP_heatmap_ageing_axis %>% dplyr::select(-SNP) %>% as.matrix()
rownames(SNP_heatmap_ageing_axis) <- row_name

breaksList <- seq(-0.1, 0.1, by = 0.001)

annotation_SNPs <- 
  ageing_axis_meta_results %>% filter(meta_p_het < 1e-08) %>% dplyr::select(SNP) %>% 
  mutate(Chromosome = case_when(str_detect(SNP, "2L") ~ "2L",
                                str_detect(SNP, "2R") ~ "2R",
                                str_detect(SNP, "3L") ~ "3L",
                                str_detect(SNP, "3R") ~ "3R",
                                str_detect(SNP, "X") ~ "X"))

annotation_studies <- 
  tibble(Study = c("Arya_2010_f_25",
                   "Huang_2020_f_18",
                   "Huang_2020_f_25",
                   "Huang_2020_f_28",
                   "Wilson_2020_f_25_1",
                   "Wilson_2020_f_25_2",
                   "Durham_2014_f_25",
                   "Patel_2021_f_23",
                   "Arya_2010_m_25",
                   "Huang_2020_m_18",
                   "Huang_2020_m_25",
                   "Huang_2020_m_28"),
         Temperature = c("25",
                         "18",
                         "25",
                         "28",
                         "25",
                         "25",
                         "25",
                         "23",
                         "25",
                         "18",
                         "25",
                         "28")) %>% 
  mutate(Sex = case_when(str_detect(Study, "_f") ~ "Female",
                         .default = "Male"),
         Mating = case_when(str_detect(Study, "Arya") ~ "NO",
                             str_detect(Study, "Huang") ~ "Throughout life",
                             str_detect(Study, "Wilson") ~ "Early life",
                             str_detect(Study, "Durham") ~ "Throughout life",
                             str_detect(Study, "Patel") ~ "Early life"),
         Author = str_extract(Study, ".*(?=\\_)"),
         Author = str_remove(Author, "_f"),
         Author = str_remove(Author, "_m"))


# create a study annotation column, need this to be a data.frame rather than a tibble for some reason 

Study_details <- annotation_studies %>%
  as.data.frame() %>% 
  dplyr::select(Study, Temperature, Mating)

my_categories <- data.frame(row.names = Study_details[, 1], Temperature = Study_details[, 2],
                            Mating = Study_details[, 3])

my_colors <- list(Temperature = c("18" = "#7bbcd5", # sailboat colours from pnw
                                  "23" = "#d0e2af",
                                  "25" = "#f5db99",
                                  "28" = "#e89c81"),
                  Mating = c("NO" = "#f8e3d1", # Shuksan from pnw
                             "Early life" = "#d7b1c5",
                             "Throughout life" = "#ac8eab"),
                  Chromosome = c("2L" = "#d8aedd", # lake colours from pnw
                                 "2R" = "#cb74ad",
                                 "3L" = "#11c2b5",
                                 "3R" = "#72e1e1",
                                 "X" = "#fbcc74"))
# create a SNP annotation column

SNP_details <- annotation_SNPs %>%
  as.data.frame()

my_SNP_categories <- data.frame(row.names = SNP_details[, 1], Chromosome = SNP_details[, 2])

my_col_names <- c("Arya et al females", "Huang et al females", "Huang et al females",
                  "Huang et al females", "Wilson et al females 1", "Wilson et al females 2", "Durham et al females",
                  "Patel et al females", "Arya et al males", "Huang et al males", "Huang et al males",
                  "Huang et al males")

  pheatmap(SNP_heatmap_ageing_axis, breaks = breaksList, 
         main = "",
         color = colorRampPalette(rev(met.brewer("Benedictus", direction = 1)))(length(breaksList)),
         legend = TRUE, cutree_rows = 6, cutree_cols = 5, 
         angle_col = 45, border_color = "white",
         annotation_col = my_categories, annotation_colors = my_colors, annotation_row = my_SNP_categories,
         fontsize = 8, labels_col = my_col_names)

Figure XX. univariate effect sizes for each of the genomic regions associated with ageing rate at the genome-wide significance threshold (p < \(10^{-8}\)) after CPASSOC. Studies are clustered by similiarity in genetic effects on the X axis, while genomic regions are clustered by similarity in effect size across studies on the Y axis. Positive effect sizes indicate that the minor allele increases ageing rate in the conditions the study was performed in.

Plot the univariate effect sizes for each of the genomic regions associated with the scaling of mortality risk at the genome-wide significance threshold (p < \(0.05^{-8}\)) after CPASSOC.

Show the code
SNP_heatmap_baseline_mortality_axis <-
  SNP_baseline_mortality_axis_data %>% 
  inner_join(
    baseline_mortality_axis_meta_results %>% 
      filter(meta_p_het < 1e-08) %>% 
      dplyr::select(SNP) %>% 
      inner_join(Genomic_regions))

row_name <- SNP_heatmap_baseline_mortality_axis$SNP
SNP_heatmap_baseline_mortality_axis <- SNP_heatmap_baseline_mortality_axis %>% dplyr::select(-SNP) %>% as.matrix()
rownames(SNP_heatmap_baseline_mortality_axis) <- row_name

breaksList <- seq(-7, 7, by = 0.01)

annotation_SNPs <- 
  baseline_mortality_axis_meta_results %>% filter(meta_p_het < 1e-08) %>% dplyr::select(SNP) %>% 
  mutate(Chromosome = case_when(str_detect(SNP, "2L") ~ "2L",
                                str_detect(SNP, "2R") ~ "2R",
                                str_detect(SNP, "3L") ~ "3L",
                                str_detect(SNP, "3R") ~ "3R",
                                str_detect(SNP, "X") ~ "X"))

annotation_studies <- 
  tibble(Study = c("Arya_2010_f_25",
                   "Huang_2020_f_18",
                   "Huang_2020_f_25",
                   "Huang_2020_f_28",
                   "Wilson_2020_f_25_1",
                   "Wilson_2020_f_25_2",
                   "Durham_2014_f_25",
                   "Patel_2021_f_23",
                   "Arya_2010_m_25",
                   "Huang_2020_m_18",
                   "Huang_2020_m_25",
                   "Huang_2020_m_28"),
         Temperature = c("25",
                         "18",
                         "25",
                         "28",
                         "25",
                         "25",
                         "25",
                         "23",
                         "25",
                         "18",
                         "25",
                         "28")) %>% 
  mutate(Sex = case_when(str_detect(Study, "_f") ~ "Female",
                         .default = "Male"),
         Mating = case_when(str_detect(Study, "Arya") ~ "NO",
                             str_detect(Study, "Huang") ~ "Throughout life",
                             str_detect(Study, "Wilson") ~ "Early life",
                             str_detect(Study, "Durham") ~ "Throughout life",
                             str_detect(Study, "Patel") ~ "Early life"),
         Author = str_extract(Study, ".*(?=\\_)"),
         Author = str_remove(Author, "_f"),
         Author = str_remove(Author, "_m"))


# create a study annotation column, need this to be a data.frame rather than a tibble for some reason 

Study_details <- annotation_studies %>%
  as.data.frame() %>% 
  dplyr::select(Study, Temperature, Mating)

my_categories <- data.frame(row.names = Study_details[, 1], Temperature = Study_details[, 2],
                            Mating = Study_details[, 3])

my_colors <- list(Temperature = c("18" = "#7bbcd5", # sailboat colours from pnw
                                  "23" = "#d0e2af",
                                  "25" = "#f5db99",
                                  "28" = "#e89c81"),
                  Mating = c("NO" = "#f8e3d1", # Shuksan from pnw
                             "Early life" = "#d7b1c5",
                             "Throughout life" = "#ac8eab"),
                  Chromosome = c("2L" = "#d8aedd", # lake colours from pnw
                                 "2R" = "#cb74ad",
                                 "3L" = "#11c2b5",
                                 "3R" = "#72e1e1",
                                 "X" = "#fbcc74"))
# create a SNP annotation column

SNP_details <- annotation_SNPs %>%
  as.data.frame()

my_SNP_categories <- data.frame(row.names = SNP_details[, 1], Chromosome = SNP_details[, 2])

my_col_names <- c("Arya et al females", "Huang et al females", "Huang et al females",
                  "Huang et al females", "Wilson et al females 1", "Wilson et al females 2", "Durham et al females",
                  "Patel et al females", "Arya et al males", "Huang et al males", "Huang et al males",
                  "Huang et al males")


  pheatmap(SNP_heatmap_baseline_mortality_axis, breaks = breaksList, 
         main = "",
         color = colorRampPalette(rev(met.brewer("Benedictus", direction = 1)))(length(breaksList)),
         legend = TRUE, cutree_rows = 6, cutree_cols = 5, 
         angle_col = 45, border_color = "white",
         annotation_col = my_categories, annotation_colors = my_colors, 
         annotation_row = my_SNP_categories,
         fontsize = 8, labels_col = my_col_names)

Figure XX. univariate effect sizes for each of the SNPs associated with mortality scaling at the genome-wide significance threshold (p < \(0.05^{-8}\)) after CPASSOC. Effect sizes are expressed in standard deviations from the mean life expectancy found in each study. Studies are clustered by similiarity in SNP effects on the X axis, while SNPs are clustered by similarity in effect size across studies on the Y axis. Positive effect sizes indicate that the minor allele increases life expectancy in the conditions the study was performed in.

Are ageing and baseline mortality polygenic?

If traits are polygenic, the majority of the genetic variants that effect their expession will have effects that are too small to detect with GWA, unless sample sizes are truly gigantic. A promising alternative is to instead look to see if effects estimated in one study can be replicated in a second, independent study. To test this in our dataset, we selected one trait measurement from each study trait that phenotyped females, at 25C, with an opportunity for mating.

As a control, this is what happens if we bin and plot the relationship between two uncorrelated variables

Show the code
sim_data <-
  tibble(draw_1 = rnorm(220437, 0, 1),
         draw_2 = rnorm(220437, 0, 1)) %>%
  arrange(draw_1) %>%
  mutate(bin = c(rep(1:floor(n()/100), each = 100),
                 rep(floor(n()/100) + 1, each = n() %% 100))) %>%
  group_by(bin) %>%
  summarise(draw_1 = mean(draw_1), draw_2 = mean(draw_2))

(boyle_plot_sim <-
  sim_data %>%
  ggplot(aes(draw_1, draw_2)) +
  geom_hline(yintercept = 0, linetype = 2) +
  geom_vline(xintercept = 0, linetype = 2) +
  geom_point(alpha = 0.8, size = 2.2) +
  stat_smooth(method = "lm", formula = y ~ x + I(x^2), linewidth = 0.75) +
  coord_cartesian(xlim = c(-4, 4), ylim = c(-4, 4)) +
  xlab("Mean effect size \n (random draw 1)") +
  ylab("Mean effect size \n (random draw 2)") +
  theme_bw() +
  theme(strip.background = element_blank(),
        strip.text = element_text(hjust=0)) +
  theme(text = element_text(size = 14))
)

Show the code
ageing_boyle_data <-
  SNP_ageing_axis_LD_pruned %>% 
  dplyr::select(SNP, Huang_2020_25_Female, Wilson_2020_1_25_Female, Durham_2014_25_Female) %>% 
  filter_at(vars(2:4), all_vars(!is.na(.))) %>% # remove NAs
  arrange(Huang_2020_25_Female) %>%
  mutate(bin = c(rep(1:floor(n()/100), each = 100),
                 rep(floor(n()/100) + 1, each = n() %% 100))) %>%
  group_by(bin) %>%
  summarise(Huang_2020_25_Female = mean(Huang_2020_25_Female), 
            Wilson_2020_1_25_Female = mean(Wilson_2020_1_25_Female),
            Durham_2014_25_Female = mean(Durham_2014_25_Female))

ageing_boyle_data_2 <-
  SNP_ageing_axis_LD_pruned %>% 
  dplyr::select(SNP, Wilson_2020_1_25_Female, Durham_2014_25_Female) %>% 
  filter_at(vars(2:3), all_vars(!is.na(.))) %>% # remove NAs
  arrange(Wilson_2020_1_25_Female) %>%
  mutate(bin = c(rep(1:floor(n()/100), each = 100),
                 rep(floor(n()/100) + 1, each = n() %% 100))) %>%
  group_by(bin) %>%
  summarise(Wilson_2020_1_25_Female = mean(Wilson_2020_1_25_Female),
            Durham_2014_25_Female = mean(Durham_2014_25_Female))

baseline_mortality_boyle_data <-
  SNP_baseline_mortality_axis_LD_pruned %>% 
  dplyr::select(SNP, Huang_2020_25_Female, Wilson_2020_1_25_Female, Durham_25_Female) %>% 
  filter_at(vars(2:4), all_vars(!is.na(.))) %>% # remove NAs
  arrange(Huang_2020_25_Female) %>%
  mutate(bin = c(rep(1:floor(n()/100), each = 100),
                 rep(floor(n()/100) + 1, each = n() %% 100))) %>%
  group_by(bin) %>%
  summarise(Huang_2020_25_Female = mean(Huang_2020_25_Female), 
            Wilson_2020_1_25_Female = mean(Wilson_2020_1_25_Female),
            Durham_25_Female = mean(Durham_25_Female))

baseline_mortality_boyle_data_2 <-
  SNP_baseline_mortality_axis_LD_pruned %>% 
  dplyr::select(SNP, Wilson_2020_1_25_Female, Durham_25_Female) %>% 
  filter_at(vars(2:3), all_vars(!is.na(.))) %>% # remove NAs
  arrange(Wilson_2020_1_25_Female) %>%
  mutate(bin = c(rep(1:floor(n()/100), each = 100),
                 rep(floor(n()/100) + 1, each = n() %% 100))) %>%
  group_by(bin) %>%
  summarise(Wilson_2020_1_25_Female = mean(Wilson_2020_1_25_Female),
            Durham_25_Female = mean(Durham_25_Female))

boyle_plot_H_W <-
  ageing_boyle_data %>%
  ggplot(aes(Huang_2020_25_Female, Wilson_2020_1_25_Female)) +
  geom_hline(yintercept = 0, linetype = 2) +
  geom_vline(xintercept = 0, linetype = 2) +
  geom_point(alpha = 0.8, size = 2.2) +
  stat_smooth(method = "lm", formula = y ~ x + I(x^2), linewidth = 0.75) +
  coord_cartesian(xlim = c(-0.16, 0.16), ylim = c(-0.1, 0.1)) +
  xlab("Ageing SNP effect (Huang et al.)") +
  ylab("Ageing SNP effect (Wilson et al.)") +
  theme_bw() +
  theme(strip.background = element_blank(),
        strip.text = element_text(hjust=0)) +
  theme(text = element_text(size = 10))

boyle_plot_H_D <-
  ageing_boyle_data %>%
  ggplot(aes(Huang_2020_25_Female, Durham_2014_25_Female)) +
  geom_hline(yintercept = 0, linetype = 2) +
  geom_vline(xintercept = 0, linetype = 2) +
  geom_point(alpha = 0.8, size = 2.2) +
  stat_smooth(method = "lm", formula = y ~ x + I(x^2), linewidth = 0.75) +
  coord_cartesian(xlim = c(-0.16, 0.16), ylim = c(-0.1, 0.1)) +
  labs(x = "Ageing SNP effect (Huang et al.)",
       y = "Ageing SNP effect (Durham et al.)") +
  theme_bw() +
  theme(plot.title = element_text(hjust = 0.5),
        text = element_text(size = 10))

boyle_plot_W_D <-
  ageing_boyle_data_2 %>%
  ggplot(aes(Wilson_2020_1_25_Female, Durham_2014_25_Female)) +
  geom_hline(yintercept = 0, linetype = 2) +
  geom_vline(xintercept = 0, linetype = 2) +
  geom_point(alpha = 0.8, size = 2.2) +
  stat_smooth(method = "lm", formula = y ~ x + I(x^2), linewidth = 0.75) +
  coord_cartesian(xlim = c(-0.16, 0.16), ylim = c(-0.1, 0.1)) +
  xlab("Ageing SNP effect (Wilson et al.)") +
  ylab("Ageing SNP effect (Durham et al.)") +
  theme_bw() +
  theme(strip.background = element_blank(),
        strip.text = element_text(hjust=0)) +
  theme(text = element_text(size = 10))

boyle_baseline_plot_H_W <-
  baseline_mortality_boyle_data %>%
  ggplot(aes(Huang_2020_25_Female, Wilson_2020_1_25_Female)) +
  geom_hline(yintercept = 0, linetype = 2) +
  geom_vline(xintercept = 0, linetype = 2) +
  geom_point(alpha = 0.8, size = 2.2) +
  stat_smooth(method = "lm", formula = y ~ x + I(x^2), linewidth = 0.75) +
  coord_cartesian(xlim = c(-5, 5), ylim = c(-3.125, 3.125)) +
    labs(x = "Scaling SNP effect (Huang et al.)",
       y = "Scaling SNP effect (Wilson et al.)") +
  theme_bw() +
  theme(strip.background = element_blank(),
        strip.text = element_text(hjust=0)) +
  theme(text = element_text(size = 10))

boyle_baseline_plot_H_D <-
  baseline_mortality_boyle_data %>%
  ggplot(aes(Huang_2020_25_Female, Durham_25_Female)) +
  geom_hline(yintercept = 0, linetype = 2) +
  geom_vline(xintercept = 0, linetype = 2) +
  geom_point(alpha = 0.8, size = 2.2) +
  stat_smooth(method = "lm", formula = y ~ x + I(x^2), linewidth = 0.75) +
  coord_cartesian(xlim = c(-5, 5), ylim = c(-3.125, 3.125)) +
    labs(x = "Scaling SNP effect (Huang et al.)",
       y = "Scaling SNP effect (Durham et al.)") +
  theme_bw() +
  theme(plot.title = element_text(hjust = 0.5),
        text = element_text(size = 10))

boyle_baseline_plot_W_D <-
  baseline_mortality_boyle_data_2 %>%
  ggplot(aes(Wilson_2020_1_25_Female, Durham_25_Female)) +
  geom_hline(yintercept = 0, linetype = 2) +
  geom_vline(xintercept = 0, linetype = 2) +
  geom_point(alpha = 0.8, size = 2.2) +
  stat_smooth(method = "lm", formula = y ~ x + I(x^2), linewidth = 0.75) +
    coord_cartesian(xlim = c(-5, 5), ylim = c(-3.125, 3.125)) +
    labs(x = "Scaling SNP effect (Wilson et al.)",
       y = "Scaling SNP effect (Durham et al.)") +
  theme_bw() +
  theme(strip.background = element_blank(),
        strip.text = element_text(hjust=0)) +
  theme(text = element_text(size = 10))

(boyle_plot_H_W + boyle_plot_H_D + boyle_plot_W_D) /
  (boyle_baseline_plot_H_W + boyle_baseline_plot_H_D + boyle_baseline_plot_W_D) 

Figure SX. Each point represents the mean effect size for a group of 100 genomic regions, ordered by association with female ageing rate (top panels) or female baseline mortality risk (bottom panels), measured in the study named on the x-axis. While traits were measured different laboratories, conditions were similar in each treatment: females were housed at 25C, with an opportunity for mating. Effect sizes are expressed as trait standard deviations.

Figure 4

Show the code
f4_a <- c + labs(title = NULL) + theme(legend.position="none")
f4_b <- e + labs(title = NULL) + theme(legend.position = "none")
f4_c <- g + labs(title = NULL) + theme(legend.position = "none")

part_1 <-
  (f4_a + f4_b + f4_c) +
  plot_layout(#guides = collect, 
              axis_titles = "collect")

f4_e <-
  boyle_plot_H_W  + 
  labs(x = "SNP effect (Huang et al.)",
       y = "SNP effect (Wilson et al.)") 

f4_f <-
  boyle_plot_H_D +
  labs(x = "SNP effect (Huang et al.)",
       y = "SNP effect (Durham et al.)") 

f4_g <-
  boyle_plot_W_D +
  labs(x = "SNP effect (Wilson et al.)",
       y = "SNP effect (Durham et al.)") 
  
part_3 <- (f4_e + f4_f + f4_g)
  

part_1 / (ageing_axis_het_plot + labs(title = NULL)) / part_3 + plot_annotation(tag_levels = "A")

Figure 4. detection of genetic variants associated with the rate of ageing. A-C demonstrate our ageing rate metric used for genome-wide association analysis. Dashed lines show simulations from the gompertz distribution: each line was generated with a different rate of ageing value and extends as the baseline mortality rate changes. Note that the slope from the regressions of lifespan equality on life expectancy align closely with these curves. Points show fly genotypes; deviations from the regression line therefore indicate that genotypes differ in the rate of ageing.

Source Code
---
title: "Genome wide analyses"
format: html
editor: source
  #markdown: 
   # wrap: 72
execute:
  warning: false
  message: false
---

# Load packages and data

The `MASS` package is required to run the `CPASSOC`. Unfortunately this clashes with the `dplyr` `select()`. So be prepared to use `dplyr::select()` to get some things to work if you're adapting the code for your own use.

```{r}
#| results: hide

library(tidyverse) # tidy coding, ggplot etc
library(data.table) # for the rleid function
library(glue) # for coding within strings
library(bigsnpr) # to install: devtools::install_github("privefl/bigsnpr")
#library(pander) # for slick simple tables
library(kableExtra) # for medium sized tables
library(DT) # for large, searchable tables
library(brms) # for bayesian models
library(tidybayes) # for bayesian plots
library(ggtext) # for markdown syntax in ggplot
library(ggnewscale) # to reset colour scales
library(MetBrewer) # for more colour palettes
library(MoMAColors) # nicer colours once again
library(PNWColors) # even more colours
#library(hexbin) # for density heat maps
#library(rcartocolor) # even more nice colours
library(patchwork) # for combining plots
#library(ggrepel) # for labelling ggplots
library(pheatmap) # for heat maps
library(MASS) # needed for CPASSOC
library(Matrix) # needed for CPASSOC
#library(flexsurv) # for survival analysis
#library(rptR) # for finding the intraclass correlation coefficient

# build a helper function that produces a table to display our data

# Create a function to build HTML searchable tables

my_data_table <- function(df){
  datatable(
    df, rownames=FALSE,
    autoHideNavigation = TRUE,
    extensions = c("Scroller",  "Buttons"),
    options = list(
      autoWidth = TRUE,
      dom = 'Bfrtip',
      deferRender=TRUE,
      scrollX=TRUE, scrollY=1000,
      scrollCollapse=TRUE,
      buttons =
        list('pageLength', 'colvis', 'csv', list(
          extend = 'pdf',
          pageSize = 'A4',
          orientation = 'landscape',
          filename = 'Lifespan_data')),
      pageLength = 100
    )
  )
}

```

## Load variant/gene annotations

DGRP variant annotations were downloaded from the [DGRP website](http://dgrp2.gnets.ncsu.edu/data/website/dgrp.fb557.annot.txt) and gene annotations for all the genes covered by DGRP variants, from the `org.Dm.eg.db` database object from `Bioconductor`.

These will be useful later when we aim to identify whether variants with notable associations with a trait overlap with any genes.

```{r}
#| results: hide

# Helper function to split a vector into chunks 
chunker <- function(x, max_chunk_size) split(x, ceiling(seq_along(x) / max_chunk_size))

if(!file.exists("data/derived/annotations.csv")){
  
  # Load annotation file, get important info
  
  annot <- read.table("data/input/dgrp.fb557.annot.txt", header = FALSE, stringsAsFactors = FALSE)
  
  get.info <- function(rows){
    lapply(rows, function(row){
      site.class.field <- strsplit(annot$V3[row], split = "]")[[1]][1]
      num.genes <- str_count(site.class.field, ";") + 1
      output <- cbind(rep(annot$V1[row], num.genes), 
                      do.call("rbind", lapply(strsplit(site.class.field, split = ";")[[1]], 
                                              function(x) strsplit(x, split = "[|]")[[1]])))
      if(ncol(output) == 5) return(output[,c(1,2,4,5)]) # only return SNPs that have some annotation. Don't get the gene symbol
      else return(NULL)
    }) %>% do.call("rbind", .)
  }
  
  variant.details <- lapply(chunker(1:nrow(annot), max_chunk_size = 10000), get.info) %>% 
    do.call("rbind", .) %>% as.data.frame()
  
  names(variant.details) <- c("SNP", "FBID", "site.class", "distance.to.gene")
  variant.details$FBID <- unlist(str_extract_all(variant.details$FBID, "FBgn[:digit:]+")) # clean up text strings for Flybase ID
  variant.details %>%
    dplyr::filter(site.class != "FBgn0003638") %>% # NB this is a bug in the DGRP's annotation file
    mutate(chr = str_remove_all(substr(SNP, 1, 2), "_")) # get chromosome now for faster sorting later
  
  annotations <- variant.details
} else annotations <- read_csv("data/derived/annotations.csv")

annotations <-
  annotations %>% 
  left_join(read.csv("data/Input/all_dmel_genes.csv")) %>% 
  dplyr::select(SNP, FBID, site.class, distance.to.gene, gene_name, chromosome)
```

## The raw dataset

```{r}
raw_data <- 
  read_delim("data/Input/Raw_data/all_raw_data.csv",delim=',') %>% 
  mutate(line = as.factor(line),
         Treatment = as.character(Treatment)) %>% 
  unite(Treatment, c("Study", "Treatment", "Sex"), sep = "_") %>% 
  filter(Genotyped == "YES") %>% 
  #unite("Treatment", c(Treatment, Sex), sep = "_") %>% 
  dplyr::select(line, Lifespan, Treatment, Vial_ID)
```

## Line mean data

In the `demographic component` of this study, we calculated mean values and standard error for each combination of line, sex, study, temperature and mating status. These data are displayed, and can be downloaded from the below table. Note that for quantitative genetic, GWA and other SNP based analysis, we removed lines that had not been genotyped (shown as `Genotyped = NO`). Lines with unknown genotypes also have unknown Wolbachia and inversions status'. Durham et al (2014) cleared all lines of Wolbachia via treatment with tetracycline.

```{r}
#| results: hide

genotyped_lines <- 
  read_csv("data/input/Genotyped_lines.csv") %>% 
  mutate(Genotyped = "YES",
         line = as.factor(line))
  
full_dataset <- 
  read.csv("data/input/lifespan_data.csv") %>% 
  as_tibble() %>% 
  mutate(line = as.factor(Line),
         Treatment = str_replace(Treatment, " ", "_"),
         Treatment = case_when(Temperature == 18 & Study == "Huang_2020" ~ "Huang_2020_1",
                               Temperature == 25 & Study == "Huang_2020" ~ "Huang_2020_2",
                               Temperature == 28 & Study == "Huang_2020" ~ "Huang_2020_3",
                               .default = Treatment)) %>%
  dplyr::select(-Line) %>% 
  left_join(genotyped_lines, by = "line") %>% 
  mutate(Genotyped = if_else(is.na(Genotyped), "NO", Genotyped)) %>% 
  dplyr::select(line, Sex, Temperature, Mated, Study, Treatment, Block, e0, SE_e0, h, SE_h, samp, Genotyped)

# DGRP studies often correct for the most common inversions and wolbachia presence. 

inversions_wolbachia <- 
  read_csv("data/Input/inversions_wolbachia.csv") %>%
  mutate(line = as.factor(str_remove(line, "DGRP_")),
         Wolbachia = if_else(Wolbachia == "y", 1, 0),
         across(2:17, ~ case_when(.x == "ST" ~ 0,
                                 .x == "INV/ST" ~ 1,
                                 .x == "INV" ~ 2))) %>% 
  dplyr::select(line, `In(2L)t`, `In(2R)NS`, `In(3R)P`, `In(3R)K`, `In(3R)Mo`, Wolbachia) %>% 
  rename(In_2L_t = `In(2L)t`,
         In_2R_NS = `In(2R)NS`,
         In_3R_P = `In(3R)P`,
         In_3R_K = `In(3R)K`,
         In_3R_Mo = `In(3R)Mo`)
# inversions pruned to those Huang et al 2015 PNAS corrected for

full_dataset <- 
  full_dataset %>% 
  left_join(inversions_wolbachia) %>% 
  mutate(Wolbachia = if_else(Study == "Durham_2014", 0, Wolbachia)) # study cleared wolbachia with tetracycline before phenotyping 
  
my_data_table(full_dataset %>% 
                mutate(across(8:11, ~ round(.x, 2))) %>% 
                dplyr::select(1:13))
```

# $\mathrm{CV}_G$

The coefficient of genetic variation is

$$\mathrm{CV}_G = \frac{100\sqrt{\sigma^2_G}}{\overline{x}}$$ where $\sigma^2_G$ is the genetic variance in the trait of interest and $\overline{x}$ is the mean trait value. This metric allows comparison of genetic variances between traits expressed on different scales. We use it here to get a compare the extent genetic variation in life expectancy and lifespan equality.

First, let's calculate the $\mathrm{CV}_G$ in life expectancy using individual-level data. While we're at it, we can also calculate broad-sense heritability. 

```{r}
# get conventional H^2 for lifespan

if(!file.exists("data/Derived/heritability/conventional_e0.csv")){
# Arya females

Arya_2010_1_Female_raw <-
  raw_data %>% 
  filter(Treatment == "Arya_2010_1_Female") #%>% 
  #mutate(Vial_ID = as.factor(rleid(Vial_ID)))

Arya_2010_1_Female_H2_model <-
    rpt(Lifespan ~ (1|line),  
        grname = c("line"),  
        data = Arya_2010_1_Female_raw, 
        datatype = "Gaussian", nboot = 1000, npermut = 0)

conventional_H2 <-
  tibble(e0_heritability = Arya_2010_1_Female_H2_model$R[[1]],
       SE = Arya_2010_1_Female_H2_model$se[1,],
       Treatment = unique(Arya_2010_1_Female_raw$Treatment))

# CVG 

Arya_2010_f_summ <- summary(Arya_2010_1_Female_H2_model$mod)

CV_G <- tibble(V_G = rnorm(4000, mean = 93.55, sd = 9.672),
       mean_trait_value = rnorm(4000, mean = 57.080, sd = 0.776)) %>% 
  mutate(CV_G = 100 * sqrt(V_G) / mean_trait_value) %>% 
  dplyr::select(V_G, CV_G) %>% 
  summarise_draws(mean, ~quantile(.x, probs = c(0.025, 0.975))) %>% 
  mutate(Treatment = unique(Arya_2010_1_Female_raw$Treatment))

# Arya males

Arya_2010_1_Male_raw <-
  raw_data %>% 
  filter(Treatment == "Arya_2010_1_Male") #%>% 
  #mutate(Vial_ID = as.factor(rleid(Vial_ID)))

Arya_2010_1_Male_H2_model <-
    rpt(Lifespan ~ (1|line),  
        grname = c("line"),  
        data = Arya_2010_1_Male_raw, 
        datatype = "Gaussian", nboot = 1000, npermut = 0)

conventional_H2 <-
  conventional_H2 %>% bind_rows(
  tibble(e0_heritability = Arya_2010_1_Male_H2_model$R[[1]],
       SE = Arya_2010_1_Male_H2_model$se[1,],
       Treatment = unique(Arya_2010_1_Male_raw$Treatment))
  )

#CVG 

Arya_2010_m_summ <- summary(Arya_2010_1_Male_H2_model$mod)

CV_G <- CV_G %>% bind_rows(tibble(V_G = rnorm(4000, mean = 98.29, sd = 9.914),
       mean_trait_value = rnorm(4000, mean = 52.9947, sd = 0.7926)) %>% 
  mutate(CV_G = 100 * sqrt(V_G) / mean_trait_value) %>% 
    dplyr::select(V_G, CV_G) %>% 
  summarise_draws(mean, ~quantile(.x, probs = c(0.025, 0.975))) %>% 
  mutate(Treatment = unique(Arya_2010_1_Male_raw$Treatment))
)

# Huang 18C females

Huang_2020_1_Female_raw <-
  raw_data %>% 
  filter(Treatment == "Huang_2020_1_Female")  #%>% 
  #mutate(Vial_ID = as.factor(rleid(Vial_ID)))

Huang_2020_1_Female_H2_model <-
    rpt(Lifespan ~ (1|line),  
        grname = c("line"),  
        data = Huang_2020_1_Female_raw, 
        datatype = "Gaussian", nboot = 1000, npermut = 0)

conventional_H2 <-
  conventional_H2 %>% bind_rows(
  tibble(e0_heritability = Huang_2020_1_Female_H2_model$R[[1]],
       SE = Huang_2020_1_Female_H2_model$se[1,],
       Treatment = unique(Huang_2020_1_Female_raw$Treatment))
  )

#CVG

Huang_2020_1_Female_summ <- summary(Huang_2020_1_Female_H2_model$mod)

CV_G <-
  CV_G %>% bind_rows(tibble(V_G = rnorm(4000, mean = 423.5, sd = 20.58),
                            mean_trait_value = rnorm(4000, mean = 79.143, sd = 1.543)) %>% 
                       mutate(CV_G = 100 * sqrt(V_G) / mean_trait_value) %>% 
                       dplyr::select(V_G, CV_G) %>% 
                       summarise_draws(mean, ~quantile(.x, probs = c(0.025, 0.975))) %>% 
                       mutate(Treatment = unique(Huang_2020_1_Female_raw$Treatment))
  )

# Huang 18C males

Huang_2020_1_Male_raw <-
  raw_data %>% 
  filter(Treatment == "Huang_2020_1_Male") #%>% 
  #mutate(Vial_ID = as.factor(rleid(Vial_ID)))

Huang_2020_1_Male_H2_model <-
    rpt(Lifespan ~ (1|line),  
        grname = c("line"),  
        data = Huang_2020_1_Male_raw, 
        datatype = "Gaussian", nboot = 1000, npermut = 0)

conventional_H2 <-
  conventional_H2 %>% bind_rows(
  tibble(e0_heritability = Huang_2020_1_Male_H2_model$R[[1]],
       SE = Huang_2020_1_Male_H2_model$se[1,],
       Treatment = unique(Huang_2020_1_Male_raw$Treatment))
  )

#CVG

Huang_2020_1_Male_summ <- summary(Huang_2020_1_Male_H2_model$mod)

CV_G <-
  CV_G %>% bind_rows(tibble(V_G = rnorm(4000, mean = 461.2, sd = 21.48),
                            mean_trait_value = rnorm(4000, mean = 86.37, sd = 1.61)) %>% 
                       mutate(CV_G = 100 * sqrt(V_G) / mean_trait_value) %>% 
                       dplyr::select(V_G, CV_G) %>% 
                       summarise_draws(mean, ~quantile(.x, probs = c(0.025, 0.975))) %>% 
                       mutate(Treatment = unique(Huang_2020_1_Male_raw$Treatment))
  )

# Huang 25C females

Huang_2020_2_Female_raw <-
  raw_data %>% 
  filter(Treatment == "Huang_2020_2_Female") #%>% 
  #mutate(Vial_ID = as.factor(rleid(Vial_ID)))

Huang_2020_2_Female_H2_model <-
    rpt(Lifespan ~ (1|line),  
        grname = c("line"),  
        data = Huang_2020_2_Female_raw, 
        datatype = "Gaussian", nboot = 1000, npermut = 0)

conventional_H2 <-
  conventional_H2 %>% bind_rows(
  tibble(e0_heritability = Huang_2020_2_Female_H2_model$R[[1]],
       SE = Huang_2020_2_Female_H2_model$se[1,],
       Treatment = unique(Huang_2020_2_Female_raw$Treatment))
  )

#CVG

Huang_2020_2_Female_summ <- summary(Huang_2020_2_Female_H2_model$mod)

CV_G <-
  CV_G %>% bind_rows(tibble(V_G = rnorm(4000, mean = 90.46, sd = 9.511),
                            mean_trait_value = rnorm(4000, mean = 42.7445, sd = 0.7069)) %>% 
                       mutate(CV_G = 100 * sqrt(V_G) / mean_trait_value) %>% 
                       dplyr::select(V_G, CV_G) %>% 
                       summarise_draws(mean, ~quantile(.x, probs = c(0.025, 0.975))) %>% 
                       mutate(Treatment = unique(Huang_2020_2_Female_raw$Treatment))
  )


# Huang 25C males

Huang_2020_2_Male_raw <-
  raw_data %>% 
  filter(Treatment == "Huang_2020_2_Male") #%>% 
  #mutate(Vial_ID = as.factor(rleid(Vial_ID)))

Huang_2020_2_Male_H2_model <-
    rpt(Lifespan ~ (1|line),  
        grname = c("line"),  
        data = Huang_2020_2_Male_raw, 
        datatype = "Gaussian", nboot = 1000, npermut = 0)

conventional_H2 <-
  conventional_H2 %>% bind_rows(
  tibble(e0_heritability = Huang_2020_2_Male_H2_model$R[[1]],
       SE = Huang_2020_2_Male_H2_model$se[1,],
       Treatment = unique(Huang_2020_2_Male_raw$Treatment))
  )

#CVG

Huang_2020_2_Male_summ <- summary(Huang_2020_2_Male_H2_model$mod)

CV_G <-
  CV_G %>% bind_rows(tibble(V_G = rnorm(4000, mean = 105.3, sd = 10.26),
                            mean_trait_value = rnorm(4000, mean = 45.2978, sd = 0.7611)) %>% 
                       mutate(CV_G = 100 * sqrt(V_G) / mean_trait_value) %>% 
                       dplyr::select(V_G, CV_G) %>% 
                       summarise_draws(mean, ~quantile(.x, probs = c(0.025, 0.975))) %>% 
                       mutate(Treatment = unique(Huang_2020_2_Male_raw$Treatment))
  )


# Huang 28C females

Huang_2020_3_Female_raw <-
  raw_data %>% 
  filter(Treatment == "Huang_2020_3_Female") #%>% 
  #mutate(Vial_ID = as.factor(rleid(Vial_ID)))

Huang_2020_3_Female_H2_model <-
    rpt(Lifespan ~ (1|line),  
        grname = c("line"),  
        data = Huang_2020_3_Female_raw, 
        datatype = "Gaussian", nboot = 1000, npermut = 0)

conventional_H2 <-
  conventional_H2 %>% bind_rows(
  tibble(e0_heritability = Huang_2020_3_Female_H2_model$R[[1]],
       SE = Huang_2020_3_Female_H2_model$se[1,],
       Treatment = unique(Huang_2020_3_Female_raw$Treatment))
  )

#CVG

Huang_2020_3_Female_summ <- summary(Huang_2020_3_Female_H2_model$mod)

CV_G <-
  CV_G %>% bind_rows(tibble(V_G = rnorm(4000, mean = 41.47, sd = 6.440),
                            mean_trait_value = rnorm(4000, mean = 28.207, sd = 0.492)) %>% 
                       mutate(CV_G = 100 * sqrt(V_G) / mean_trait_value) %>% 
                       dplyr::select(V_G, CV_G) %>% 
                       summarise_draws(mean, ~quantile(.x, probs = c(0.025, 0.975))) %>% 
                       mutate(Treatment = unique(Huang_2020_3_Female_raw$Treatment))
  )

# Huang 28C males

Huang_2020_3_Male_raw <-
  raw_data %>% 
  filter(Treatment == "Huang_2020_3_Male") #%>% 
  #mutate(Vial_ID = as.factor(rleid(Vial_ID)))

Huang_2020_3_Male_H2_model <-
    rpt(Lifespan ~ (1|line),  
        grname = c("line"),  
        data = Huang_2020_3_Male_raw, 
        datatype = "Gaussian", nboot = 1000, npermut = 0)

conventional_H2 <-
  conventional_H2 %>% bind_rows(
  tibble(e0_heritability = Huang_2020_3_Male_H2_model$R[[1]],
       SE = Huang_2020_3_Male_H2_model$se[1,],
       Treatment = unique(Huang_2020_3_Male_raw$Treatment))
  )

#CVG

Huang_2020_3_Male_summ <- summary(Huang_2020_3_Male_H2_model$mod)

CV_G <-
  CV_G %>% bind_rows(tibble(V_G = rnorm(4000, mean = 43.91, sd = 6.627),
                            mean_trait_value = rnorm(4000, mean = 27.8709, sd = 0.5054)) %>% 
                       mutate(CV_G = 100 * sqrt(V_G) / mean_trait_value) %>% 
                       dplyr::select(V_G, CV_G) %>% 
                       summarise_draws(mean, ~quantile(.x, probs = c(0.025, 0.975))) %>% 
                       mutate(Treatment = unique(Huang_2020_3_Male_raw$Treatment))
  )

# Wilson females 1

Wilson_2020_1_Female_raw <-
  raw_data %>% 
  filter(Treatment == "Wilson_2020_1_Female")

Wilson_2020_1_Female_H2_model <-
    rpt(Lifespan ~ (1|line),  
        grname = c("line"),  
        data = Wilson_2020_1_Female_raw, 
        datatype = "Gaussian", nboot = 1000, npermut = 0)

conventional_H2 <-
  conventional_H2 %>% bind_rows(
  tibble(e0_heritability = Wilson_2020_1_Female_H2_model$R[[1]],
       SE = Wilson_2020_1_Female_H2_model$se[1,],
       Treatment = unique(Wilson_2020_1_Female_raw$Treatment))
  )

#CVG

Wilson_2020_1_Female_summ <- summary(Wilson_2020_1_Female_H2_model$mod)

CV_G <-
  CV_G %>% bind_rows(tibble(V_G = rnorm(4000, mean = 97.04, sd = 9.851),
                            mean_trait_value = rnorm(4000, mean = 40.5357, sd = 0.7809)) %>% 
                       mutate(CV_G = 100 * sqrt(V_G) / mean_trait_value) %>% 
                       dplyr::select(V_G, CV_G) %>% 
                       summarise_draws(mean, ~quantile(.x, probs = c(0.025, 0.975))) %>% 
                       mutate(Treatment = unique(Wilson_2020_1_Female_raw$Treatment))
  )

# Wilson females 2

Wilson_2020_2_Female_raw <-
  raw_data %>% 
  filter(Treatment == "Wilson_2020_2_Female")

Wilson_2020_2_Female_H2_model <-
    rpt(Lifespan ~ (1|line),  
        grname = c("line"),  
        data = Wilson_2020_2_Female_raw, 
        datatype = "Gaussian", nboot = 1000, npermut = 0)

conventional_H2 <-
  conventional_H2 %>% bind_rows(
  tibble(e0_heritability = Wilson_2020_2_Female_H2_model$R[[1]],
       SE = Wilson_2020_2_Female_H2_model$se[1,],
       Treatment = unique(Wilson_2020_2_Female_raw$Treatment))
  )

#CVG

Wilson_2020_2_Female_summ <- summary(Wilson_2020_2_Female_H2_model$mod)

CV_G <-
  CV_G %>% bind_rows(tibble(V_G = rnorm(4000, mean = 69.82, sd = 8.356),
                            mean_trait_value = rnorm(4000, mean = 32.2761, sd = 0.6621)) %>% 
                       mutate(CV_G = 100 * sqrt(V_G) / mean_trait_value) %>% 
                       dplyr::select(V_G, CV_G) %>% 
                       summarise_draws(mean, ~quantile(.x, probs = c(0.025, 0.975))) %>% 
                       mutate(Treatment = unique(Wilson_2020_2_Female_raw$Treatment))
  )

# Durham females

Durham_2014_1_Female_raw <-
  raw_data %>% 
  filter(Treatment == "Durham_2014_1_Female")

Durham_2014_1_Female_H2_model <-
    rpt(Lifespan ~ (1|line),  
        grname = c("line"),  
        data = Durham_2014_1_Female_raw, 
        datatype = "Gaussian", nboot = 1000, npermut = 0)

conventional_H2 <-
  conventional_H2 %>% bind_rows(
  tibble(e0_heritability = Durham_2014_1_Female_H2_model$R[[1]],
       SE = Durham_2014_1_Female_H2_model$se[1,],
       Treatment = unique(Durham_2014_1_Female_raw$Treatment))
  )

#CVG

Durham_2014_1_Female_summ <- summary(Durham_2014_1_Female_H2_model$mod)

CV_G <-
  CV_G %>% bind_rows(tibble(V_G = rnorm(4000, mean = 76.95, sd = 8.772),
                            mean_trait_value = rnorm(4000, mean = 36.1214, sd = 0.6892)) %>% 
                       mutate(CV_G = 100 * sqrt(V_G) / mean_trait_value) %>% 
                       dplyr::select(V_G, CV_G) %>% 
                       summarise_draws(mean, ~quantile(.x, probs = c(0.025, 0.975))) %>% 
                       mutate(Treatment = unique(Durham_2014_1_Female_raw$Treatment))
  )

# Patel females

Patel_2021_1_Female_raw <-
  raw_data %>% 
  filter(Treatment == "Patel_2021_1_Female")

Patel_2021_1_Female_H2_model <-
    rpt(Lifespan ~ (1|line),  
        grname = c("line"),  
        data = Patel_2021_1_Female_raw, 
        datatype = "Gaussian", nboot = 1000, npermut = 0)

conventional_H2 <-
  conventional_H2 %>% bind_rows(
  tibble(e0_heritability = Patel_2021_1_Female_H2_model$R[[1]],
       SE = Patel_2021_1_Female_H2_model$se[1,],
       Treatment = unique(Patel_2021_1_Female_raw$Treatment))
  )

#CVG

Patel_2021_1_Female_summ <- summary(Patel_2021_1_Female_H2_model$mod)

CV_G <-
  CV_G %>% bind_rows(tibble(V_G = rnorm(4000, mean = 145.1, sd = 12.05),
                            mean_trait_value = rnorm(4000, mean = 33.0666, sd = 0.8901)) %>% 
                       mutate(CV_G = 100 * sqrt(V_G) / mean_trait_value) %>% 
                       dplyr::select(V_G, CV_G) %>% 
                       summarise_draws(mean, ~quantile(.x, probs = c(0.025, 0.975))) %>% 
                       mutate(Treatment = unique(Patel_2021_1_Female_raw$Treatment))
  )

# Dick 1 females

Dick_2011_1_Female_raw <-
  raw_data %>% 
  filter(Treatment == "Dick_2011_1_Female") #%>% 
  #mutate(Vial_ID = as.factor(rleid(Vial_ID)))

Dick_2011_1_Female_H2_model <-
    rpt(Lifespan ~ (1|line),  
        grname = c("line"),  
        data = Dick_2011_1_Female_raw, 
        datatype = "Gaussian", nboot = 1000, npermut = 0)

conventional_H2 <-
  conventional_H2 %>% bind_rows(
  tibble(e0_heritability = Dick_2011_1_Female_H2_model$R[[1]],
       SE = Dick_2011_1_Female_H2_model$se[1,],
       Treatment = unique(Dick_2011_1_Female_raw$Treatment))
  )

#CVG

Dick_2011_1_Female_summ <- summary(Dick_2011_1_Female_H2_model$mod)

CV_G <-
  CV_G %>% bind_rows(tibble(V_G = rnorm(4000, mean = 34.08, sd = 5.837),
                            mean_trait_value = rnorm(4000, mean = 29.854, sd = 1.019)) %>% 
                       mutate(CV_G = 100 * sqrt(V_G) / mean_trait_value) %>% 
                       dplyr::select(V_G, CV_G) %>% 
                       summarise_draws(mean, ~quantile(.x, probs = c(0.025, 0.975))) %>% 
                       mutate(Treatment = unique(Dick_2011_1_Female_raw$Treatment))
  )

# Dick 1 males

Dick_2011_1_Male_raw <-
  raw_data %>% 
  filter(Treatment == "Dick_2011_1_Male") #%>% 
  #mutate(Vial_ID = as.factor(rleid(Vial_ID)))

Dick_2011_1_Male_H2_model <-
    rpt(Lifespan ~ (1|line),  
        grname = c("line"),  
        data = Dick_2011_1_Male_raw, 
        datatype = "Gaussian", nboot = 1000, npermut = 0)

conventional_H2 <-
  conventional_H2 %>% bind_rows(
  tibble(e0_heritability = Dick_2011_1_Male_H2_model$R[[1]],
       SE = Dick_2011_1_Male_H2_model$se[1,],
       Treatment = unique(Dick_2011_1_Male_raw$Treatment))
  )

#CVG

Dick_2011_1_Male_summ <- summary(Dick_2011_1_Male_H2_model$mod)

CV_G <-
  CV_G %>% bind_rows(tibble(V_G = rnorm(4000, mean = 41.15, sd = 6.415),
                            mean_trait_value = rnorm(4000, mean = 27.834, sd = 1.113)) %>% 
                       mutate(CV_G = 100 * sqrt(V_G) / mean_trait_value) %>% 
                       dplyr::select(V_G, CV_G) %>% 
                       summarise_draws(mean, ~quantile(.x, probs = c(0.025, 0.975))) %>% 
                       mutate(Treatment = unique(Dick_2011_1_Male_raw$Treatment))
  )

# Dick 2 females

Dick_2011_2_Female_raw <-
  raw_data %>% 
  filter(Treatment == "Dick_2011_2_Female") #%>% 
  #mutate(Vial_ID = as.factor(rleid(Vial_ID)))

Dick_2011_2_Female_H2_model <-
    rpt(Lifespan ~ (1|line),  
        grname = c("line"),  
        data = Dick_2011_2_Female_raw, 
        datatype = "Gaussian", nboot = 1000, npermut = 0)

conventional_H2 <-
  conventional_H2 %>% bind_rows(
  tibble(e0_heritability = Dick_2011_2_Female_H2_model$R[[1]],
       SE = Dick_2011_2_Female_H2_model$se[1,],
       Treatment = unique(Dick_2011_2_Female_raw$Treatment))
  )

#CVG

Dick_2011_2_Female_summ <- summary(Dick_2011_2_Female_H2_model$mod)

CV_G <-
  CV_G %>% bind_rows(tibble(V_G = rnorm(4000, mean = 26.47, sd = 5.145),
                            mean_trait_value = rnorm(4000, mean = 23.1871, sd = 0.8972)) %>% 
                       mutate(CV_G = 100 * sqrt(V_G) / mean_trait_value) %>% 
                       dplyr::select(V_G, CV_G) %>% 
                       summarise_draws(mean, ~quantile(.x, probs = c(0.025, 0.975))) %>% 
                       mutate(Treatment = unique(Dick_2011_2_Female_raw$Treatment))
  )

# Dick 2 males

Dick_2011_2_Male_raw <-
  raw_data %>% 
  filter(Treatment == "Dick_2011_2_Male") #%>% 
  #mutate(Vial_ID = as.factor(rleid(Vial_ID)))

Dick_2011_2_Male_H2_model <-
    rpt(Lifespan ~ (1|line),  
        grname = c("line"),  
        data = Dick_2011_2_Male_raw, 
        datatype = "Gaussian", nboot = 1000, npermut = 0)

conventional_H2 <-
  conventional_H2 %>% bind_rows(
  tibble(e0_heritability = Dick_2011_2_Male_H2_model$R[[1]],
       SE = Dick_2011_2_Male_H2_model$se[1,],
       Treatment = unique(Dick_2011_2_Male_raw$Treatment))
  )

#CVG

Dick_2011_2_Male_summ <- summary(Dick_2011_2_Male_H2_model$mod)

CV_G <-
  CV_G %>% bind_rows(tibble(V_G = rnorm(4000, mean = 23.33, sd = 4.831),
                            mean_trait_value = rnorm(4000, mean = 19.9551, sd = 0.8401)) %>% 
                       mutate(CV_G = 100 * sqrt(V_G) / mean_trait_value) %>% 
                       dplyr::select(V_G, CV_G) %>% 
                       summarise_draws(mean, ~quantile(.x, probs = c(0.025, 0.975))) %>% 
                       mutate(Treatment = unique(Dick_2011_2_Male_raw$Treatment))
  )

# Dick 3 females

Dick_2011_3_Female_raw <-
  raw_data %>% 
  filter(Treatment == "Dick_2011_3_Female")  #%>% 
  #mutate(Vial_ID = as.factor(rleid(Vial_ID)))

Dick_2011_3_Female_H2_model <-
    rpt(Lifespan ~ (1|line),  
        grname = c("line"),  
        data = Dick_2011_3_Female_raw, 
        datatype = "Gaussian", nboot = 1000, npermut = 0)

conventional_H2 <-
  conventional_H2 %>% bind_rows(
  tibble(e0_heritability = Dick_2011_3_Female_H2_model$R[[1]],
       SE = Dick_2011_3_Female_H2_model$se[1,],
       Treatment = unique(Dick_2011_3_Female_raw$Treatment))
  )

#CVG

Dick_2011_3_Female_summ <- summary(Dick_2011_3_Female_H2_model$mod)

CV_G <-
  CV_G %>% bind_rows(tibble(V_G = rexp(4000, rate = 1/5.039), # note the use of rexp instead of rnorm to avoid neg variance values
                            mean_trait_value = rnorm(4000, mean = 30.4241, sd = 0.8665)) %>% 
                       mutate(CV_G = 100 * sqrt(V_G) / mean_trait_value) %>% 
                       dplyr::select(V_G, CV_G) %>% 
                       summarise_draws(mean, ~quantile(.x, probs = c(0.025, 0.975))) %>% 
                       mutate(Treatment = unique(Dick_2011_3_Female_raw$Treatment))
  )

# Dick 3 males

Dick_2011_3_Male_raw <-
  raw_data %>% 
  filter(Treatment == "Dick_2011_3_Male")  #%>% 
  #mutate(Vial_ID = as.factor(rleid(Vial_ID)))

Dick_2011_3_Male_H2_model <-
    rpt(Lifespan ~ (1|line),  
        grname = c("line"),  
        data = Dick_2011_3_Male_raw, 
        datatype = "Gaussian", nboot = 1000, npermut = 0)

conventional_H2 <-
  conventional_H2 %>% bind_rows(
  tibble(e0_heritability = Dick_2011_3_Male_H2_model$R[[1]],
       SE = Dick_2011_3_Male_H2_model$se[1,],
       Treatment = unique(Dick_2011_3_Male_raw$Treatment))
  )

#CVG

Dick_2011_3_Male_summ <- summary(Dick_2011_3_Male_H2_model$mod)

CV_G <-
  CV_G %>% bind_rows(tibble(V_G = rnorm(4000, mean = 40.45, sd = 6.360),
                            mean_trait_value = rnorm(4000, mean = 27.75, sd = 2.15)) %>% 
                       mutate(CV_G = 100 * sqrt(V_G) / mean_trait_value) %>% 
                       dplyr::select(V_G, CV_G) %>% 
                       summarise_draws(mean, ~quantile(.x, probs = c(0.025, 0.975))) %>% 
                       mutate(Treatment = unique(Dick_2011_3_Male_raw$Treatment))
  )

# Hoffman 1 females

Hoffman_2021_1_Female_raw <-
  raw_data %>% 
  filter(Treatment == "Hoffman_2021_1_Female") #%>% 
  #mutate(Vial_ID = as.factor(rleid(Vial_ID)))

Hoffman_2021_1_Female_H2_model <-
    rpt(Lifespan ~ (1|line),  
        grname = c("line"),  
        data = Hoffman_2021_1_Female_raw, 
        datatype = "Gaussian", nboot = 1000, npermut = 0)

conventional_H2 <-
  conventional_H2 %>% bind_rows(
  tibble(e0_heritability = Hoffman_2021_1_Female_H2_model$R[[1]],
       SE = Hoffman_2021_1_Female_H2_model$se[1,],
       Treatment = unique(Hoffman_2021_1_Female_raw$Treatment))
  )

#CVG

Hoffman_2021_1_Female_summ <- summary(Hoffman_2021_1_Female_H2_model$mod)

CV_G <-
  CV_G %>% bind_rows(tibble(V_G = rnorm(4000, mean = 162.1, sd = 12.73),
                            mean_trait_value = rnorm(4000, mean = 53.241, sd = 3.208)) %>% 
                       mutate(CV_G = 100 * sqrt(V_G) / mean_trait_value) %>% 
                       dplyr::select(V_G, CV_G) %>% 
                       summarise_draws(mean, ~quantile(.x, probs = c(0.025, 0.975))) %>% 
                       mutate(Treatment = unique(Hoffman_2021_1_Female_raw$Treatment))
  )

# Hoffman 1 males

Hoffman_2021_1_Male_raw <-
  raw_data %>% 
  filter(Treatment == "Hoffman_2021_1_Male") #%>% 
  #mutate(Vial_ID = as.factor(rleid(Vial_ID)))

Hoffman_2021_1_Male_H2_model <-
    rpt(Lifespan ~ (1|line),  
        grname = c("line"),  
        data = Hoffman_2021_1_Male_raw, 
        datatype = "Gaussian", nboot = 1000, npermut = 0)

conventional_H2 <-
  conventional_H2 %>% bind_rows(
  tibble(e0_heritability = Hoffman_2021_1_Male_H2_model$R[[1]],
       SE = Hoffman_2021_1_Male_H2_model$se[1,],
       Treatment = unique(Hoffman_2021_1_Male_raw$Treatment))
  )

#CVG

Hoffman_2021_1_Male_summ <- summary(Hoffman_2021_1_Male_H2_model$mod)

CV_G <-
  CV_G %>% bind_rows(tibble(V_G = rnorm(4000, mean = 195.8, sd = 13.99),
                            mean_trait_value = rnorm(4000, mean = 52.761, sd = 3.516)) %>% 
                       mutate(CV_G = 100 * sqrt(V_G) / mean_trait_value) %>% 
                       dplyr::select(V_G, CV_G) %>% 
                       summarise_draws(mean, ~quantile(.x, probs = c(0.025, 0.975))) %>% 
                       mutate(Treatment = unique(Hoffman_2021_1_Male_raw$Treatment))
  )

# Hoffman 2 females

Hoffman_2021_2_Female_raw <-
  raw_data %>% 
  filter(Treatment == "Hoffman_2021_2_Female") #%>% 
  #mutate(Vial_ID = as.factor(rleid(Vial_ID)))

Hoffman_2021_2_Female_H2_model <-
    rpt(Lifespan ~ (1|line),  
        grname = c("line"),  
        data = Hoffman_2021_2_Female_raw, 
        datatype = "Gaussian", nboot = 1000, npermut = 0)

conventional_H2 <-
  conventional_H2 %>% bind_rows(
  tibble(e0_heritability = Hoffman_2021_2_Female_H2_model$R[[1]],
       SE = Hoffman_2021_2_Female_H2_model$se[1,],
       Treatment = unique(Hoffman_2021_2_Female_raw$Treatment))
  )

#CVG

Hoffman_2021_2_Female_summ <- summary(Hoffman_2021_2_Female_H2_model$mod)

CV_G <-
  CV_G %>% bind_rows(tibble(V_G = rnorm(4000, mean = 223.9, sd = 14.96),
                            mean_trait_value = rnorm(4000, mean = 55.492, sd = 4.333)) %>% 
                       mutate(CV_G = 100 * sqrt(V_G) / mean_trait_value) %>% 
                       dplyr::select(V_G, CV_G) %>% 
                       summarise_draws(mean, ~quantile(.x, probs = c(0.025, 0.975))) %>% 
                       mutate(Treatment = unique(Hoffman_2021_2_Female_raw$Treatment))
  )

# Hoffman 2 males

Hoffman_2021_2_Male_raw <-
  raw_data %>% 
  filter(Treatment == "Hoffman_2021_2_Male") #%>% 
  #mutate(Vial_ID = as.factor(rleid(Vial_ID)))

Hoffman_2021_2_Male_H2_model <-
    rpt(Lifespan ~ (1|line),  
        grname = c("line"),  
        data = Hoffman_2021_2_Male_raw, 
        datatype = "Gaussian", nboot = 1000, npermut = 0)

conventional_H2 <-
  conventional_H2 %>% bind_rows(
  tibble(e0_heritability = Hoffman_2021_2_Male_H2_model$R[[1]],
       SE = Hoffman_2021_2_Male_H2_model$se[1,],
       Treatment = unique(Hoffman_2021_2_Male_raw$Treatment))
  )

#CVG

Hoffman_2021_2_Male_summ <- summary(Hoffman_2021_2_Male_H2_model$mod)

CV_G <-
  CV_G %>% bind_rows(tibble(V_G = rnorm(4000, mean = 233.8, sd = 15.29),
                            mean_trait_value = rnorm(4000, mean = 59.240, sd = 4.428)) %>% 
                       mutate(CV_G = 100 * sqrt(V_G) / mean_trait_value) %>% 
                       dplyr::select(V_G, CV_G) %>% 
                       summarise_draws(mean, ~quantile(.x, probs = c(0.025, 0.975))) %>% 
                       mutate(Treatment = unique(Hoffman_2021_2_Male_raw$Treatment))
  )

# Zhao 1 females

Zhao_2022_1_Female_raw <-
  raw_data %>% 
  filter(Treatment == "Zhao_2022_1_Female") #%>% 
  #mutate(Vial_ID = as.factor(rleid(Vial_ID)))

Zhao_2022_1_Female_H2_model <-
    rpt(Lifespan ~ (1|line),  
        grname = c("line"),  
        data = Zhao_2022_1_Female_raw, 
        datatype = "Gaussian", nboot = 1000, npermut = 0)

conventional_H2 <-
  conventional_H2 %>% bind_rows(
  tibble(e0_heritability = Zhao_2022_1_Female_H2_model$R[[1]],
       SE = Zhao_2022_1_Female_H2_model$se[1,],
       Treatment = unique(Zhao_2022_1_Female_raw$Treatment))
  )

#CVG

Zhao_2022_1_Female_summ <- summary(Zhao_2022_1_Female_H2_model$mod)

CV_G <-
  CV_G %>% bind_rows(tibble(V_G = rnorm(4000, mean = 200.51, sd = 14.160),
                            mean_trait_value = rnorm(4000, mean = 58.362, sd = 3.172)) %>% 
                       mutate(CV_G = 100 * sqrt(V_G) / mean_trait_value) %>% 
                       dplyr::select(V_G, CV_G) %>% 
                       summarise_draws(mean, ~quantile(.x, probs = c(0.025, 0.975))) %>% 
                       mutate(Treatment = unique(Zhao_2022_1_Female_raw$Treatment))
  )

write_csv(conventional_H2, "data/Derived/heritability/conventional_e0.csv")
write_csv(CV_G, "data/Derived/heritability/conventional_CVG.csv")
} else {
  conventional_H2 <- read_delim("data/Derived/heritability/conventional_e0.csv")
  CV_G <- read_delim("data/Derived/heritability/conventional_CVG.csv")}

```

We can also calculate $\mathrm{CV}_G$ directly from line (genotype) means. 

```{r}
CVG_data <-
  full_dataset %>% 
  unite(Treatment, c("Treatment", "Sex"), sep = "_") %>% 
  filter(Genotyped == "YES")

# make a function to update the model and the posterior sample output with the desired trait

CVG_e0_calculator <- function(selected_treatment){
  
  data <- CVG_data %>% filter(Treatment == selected_treatment)
  
  model <- update(e0_VG_model, newdata = data)
  
  posterior <- 
    as_draws_df(model) %>%
    dplyr::select(b_Intercept, sigma) %>% 
    mutate(VG = sigma^2,
           CVG = 100 * sqrt(VG) / abs(b_Intercept)) %>%  # Houle 1992
    mutate(Trait = "e0",
           Treatment = selected_treatment)
  
  posterior
}

CVG_h_calculator <- function(selected_treatment){
  
  data <- CVG_data %>% filter(Treatment == selected_treatment) %>% filter(!is.na(SE_h))
  
  model <- update(h_VG_model, newdata = data)
  
  posterior <- 
    as_draws_df(model) %>%
    dplyr::select(b_Intercept, sigma) %>% 
    mutate(VG = sigma^2,
           CVG = 100 * sqrt(VG) / abs(b_Intercept)) %>%  # Houle 1992
    mutate(Trait = "h",
           Treatment = selected_treatment)
  
  posterior
}

treatment_list <- unique(CVG_data$Treatment)

# Run the models

Run_function <- FALSE # Change this to TRUE to run the models

if(Run_function){
  
d <- CVG_data %>% filter(Treatment == "Arya_2010_1_Female")

e0_VG_model <-
  brm(data = d,
      family = gaussian(),
      e0 | mi(SE_e0) ~ 1,
      chains = 4, cores = 4, 
      seed = 1, iter = 6000, warmup = 2000)

h_VG_model <-
  brm(data = d,
      family = gaussian(),
      h | mi(SE_h) ~ 1,
      chains = 4, cores = 4, 
      seed = 1, iter = 6000, warmup = 2000)  
  
CVG_data_e0 <- map_dfr(treatment_list, CVG_e0_calculator)
CVG_data_h <- map_dfr(treatment_list, CVG_h_calculator)

CVG_data <- bind_rows(CVG_data_e0, 
                      CVG_data_h)
  
  CVG_data %>% 
    write_csv("data/Derived/heritability/CVG_data.csv")
} else {
  CVG_data <- read_csv("data/Derived/heritability/CVG_data.csv")
}

CVG_summarised <-
  CVG_data %>% 
  group_by(Trait, Treatment) %>% 
  summarise_draws(mean, sd, ~quantile(.x, probs = c(0.025, 0.975))) %>%
  ungroup() %>% 
  mutate(across(4:7, ~round(.x, 1))) %>% 
  pivot_wider(names_from = "variable", values_from = 4:7)

```

How do the results of the two methods compare?

```{r}
CVG_comparison <-
  CV_G %>% filter(variable == "CV_G") %>% 
  rename(mean_CVG_conventional = mean,
         `2.5%_CVG conventional` = `2.5%`,
         `97.5%_CVG conventional` = `97.5%`) %>% 
  dplyr::select(-variable) %>% 
  left_join(
    
    CVG_summarised %>% 
      filter(Trait == "e0") %>% 
      rename(line_mean_CVG = mean_CVG,
             `line 2.5%_CVG` = `2.5%_CVG`,
             `line 97.5%_CVG` = `97.5%_CVG`) %>% 
      dplyr::select(Treatment, line_mean_CVG, `line 2.5%_CVG`, `line 97.5%_CVG`)
  )

CVG_comparison %>% 
  ggplot(aes(x = mean_CVG_conventional, y = line_mean_CVG)) +
  geom_abline(intercept = 0, slope = 1, linetype =2) +
  geom_point(size = 2.5) +
  scale_x_continuous(limits = c(0, 40), expand = c(0, 0)) +
  scale_y_continuous(limits = c(0, 40), expand = c(0, 0)) +
  labs(x = "Life expectancy CVG estimated from individual data",
       y = "Life expectancy CVG estimated from line mean data") +
  theme_bw() +
  theme(text = element_text(size = 12))
  
```

$\mathrm{CV}_G$ in life expectancy calculated from individual level data is $\approx$ $\mathrm{CV}_G$ calculated from line means. We therefore treat these line mean estimates as reasonable and use them to compare life expectancy and lifespan equality.

**Table SX**. $CV_G$ estimates for life expectancy and lifespan equality, estimated from genotype means.

```{r}
CVG_summarised %>% 
  dplyr::select(Trait, Treatment, mean_CVG, `2.5%_CVG`, `97.5%_CVG`) %>% 
  pivot_wider(names_from = Trait, values_from = 3:5) %>% 
  dplyr::select(Treatment, mean_CVG_e0, `2.5%_CVG_e0`, `97.5%_CVG_e0`, mean_CVG_h,
                `2.5%_CVG_h`, `97.5%_CVG_h`) %>%
  rename(`Life expectancy CVG` = mean_CVG_e0,
         `Life expectancy 2.5%`= `2.5%_CVG_e0`,
         `Life expectancy 97.5%`= `97.5%_CVG_e0`,
         `Lifespan equality CVG` = mean_CVG_h,
         `Lifespan equality 2.5%`= `2.5%_CVG_h`,
         `Lifespan equality 97.5%`= `97.5%_CVG_h`) %>% 
  kable() %>% 
  kable_styling()
```

# Preparing for univariate GWAS

The preparation of data for univariate GWAS generally follows [Holman and Wong's (2023)](https://academic.oup.com/evolut/article/77/12/2642/7279223) DGRP GWAS of fitness in different contexts. See their associated `workflowr` [report](https://lukeholman.github.io/fitnessGWAS/index.html) for a comprehensive breakdown of their data preparation.

## Loading data used in GWA tests

For GWAS and later CPASSOC, we split the data by study, removed studies that phenotyped \< 100 lines and adjusted line means to account for experimental block where applicable. Importantly, we also split the Wilson et al (2020) data by dietary treatment; while we do not explicitly consider diet in our analysis, lifespan in one dietary treatment is considered a separate trait from lifespan measured in a second dietary treatment.

```{r}
Arya_2010_f <-
  full_dataset %>% 
  filter(Study == "Arya_2010" & Sex == "Female" & Genotyped == "YES") %>% 
  mutate(e0_scaled = scale(e0),
         h_scaled = scale(h)) %>% 
  dplyr::select(line, Sex, Temperature, Mated, Treatment, e0, e0_scaled, h, h_scaled)

Arya_2010_m <-
  full_dataset %>% 
  filter(Study == "Arya_2010" & Sex == "Male" & Genotyped == "YES") %>% 
  mutate(e0_scaled = scale(e0),
         h_scaled = scale(h)) %>% 
  dplyr::select(line, Sex, Temperature, Mated, Treatment, e0, e0_scaled, h, h_scaled)

Huang_2020_f_18 <-
  full_dataset %>% 
  filter(Study == "Huang_2020" & Sex == "Female" & Temperature == 18 & Genotyped == "YES") %>% 
  mutate(e0_scaled = scale(e0),
         h_scaled = scale(h)) %>% 
  dplyr::select(line, Sex, Temperature, Mated, Treatment, e0, e0_scaled, h, h_scaled)

Huang_2020_m_18 <-
  full_dataset %>% 
  filter(Study == "Huang_2020" & Sex == "Male" & Temperature == 18 & Genotyped == "YES") %>% 
  mutate(e0_scaled = scale(e0),
         h_scaled = scale(h)) %>% 
  dplyr::select(line, Sex, Temperature, Mated, Treatment, e0, e0_scaled, h, h_scaled)

Huang_2020_f_25 <-
  full_dataset %>% 
  filter(Study == "Huang_2020" & Sex == "Female" & Temperature == 25 & Genotyped == "YES") %>% 
  mutate(e0_scaled = scale(e0),
         h_scaled = scale(h)) %>% 
  dplyr::select(line, Sex, Temperature, Mated, Treatment, e0, e0_scaled, h, h_scaled)

Huang_2020_m_25 <-
  full_dataset %>% 
  filter(Study == "Huang_2020" & Sex == "Male" & Temperature == 25 & Genotyped == "YES") %>% 
  mutate(e0_scaled = scale(e0),
         h_scaled = scale(h)) %>% 
  dplyr::select(line, Sex, Temperature, Mated, Treatment, e0, e0_scaled, h, h_scaled)

Huang_2020_f_28 <-
  full_dataset %>% 
  filter(Study == "Huang_2020" & Sex == "Female" & Temperature == 28 & Genotyped == "YES") %>% 
  mutate(e0_scaled = scale(e0),
         h_scaled = scale(h)) %>% 
  dplyr::select(line, Sex, Temperature, Mated, Treatment, e0, e0_scaled, h, h_scaled)

Huang_2020_m_28 <-
  full_dataset %>% 
  filter(Study == "Huang_2020" & Sex == "Male" & Temperature == 28 & Genotyped == "YES") %>% 
  mutate(e0_scaled = scale(e0),
         h_scaled = scale(h)) %>% 
  dplyr::select(line, Sex, Temperature, Mated, Treatment, e0, e0_scaled, h, h_scaled)

# In this study, some lines were measured twice per treatment, and a small subset were measured three times. We take the mean across blocks as the line mean, following the original study.

Wilson_2020_f_1 <-
  full_dataset %>% 
  filter(Treatment == "Wilson_2020_1" & Genotyped == "YES") %>%
  group_by(line) %>% 
  mutate(e0 = mean(e0),
         h = mean(h)) %>% 
  ungroup() %>% 
  distinct(line, .keep_all = TRUE) %>%
  mutate(e0_scaled = scale(e0),
         h_scaled = scale(h)) %>% 
  dplyr::select(line, Sex, Temperature, Mated, Treatment, e0, e0_scaled, h, h_scaled)

Wilson_2020_f_2 <-
  full_dataset %>% 
  filter(Treatment == "Wilson_2020_2" & Genotyped == "YES") %>% 
  group_by(line) %>% 
  mutate(e0 = mean(e0),
         h = mean(h)) %>% 
  ungroup() %>% 
  distinct(line, .keep_all = TRUE) %>%
  mutate(e0_scaled = scale(e0),
         h_scaled = scale(h)) %>% 
  dplyr::select(line, Sex, Temperature, Mated, Treatment, e0, e0_scaled, h, h_scaled)

# In this study, each line was measured three times. We take the mean across blocks as the line mean

Durham_2014_f <-
  full_dataset %>% 
  filter(Study == "Durham_2014" & Genotyped == "YES") %>% 
  group_by(line) %>% 
  mutate(e0 = mean(e0),
         h = mean(h)) %>% 
  ungroup() %>% 
  distinct(line, .keep_all = TRUE) %>%
  mutate(e0_scaled = scale(e0),
         h_scaled = scale(h)) %>% 
  dplyr::select(line, Sex, Temperature, Mated, Treatment, e0, e0_scaled, h, h_scaled)


Patel_2021_f <-
  full_dataset %>% 
  filter(Study == "Patel_2021" & Genotyped == "YES") %>% 
  mutate(e0_scaled = scale(e0),
         h_scaled = scale(h)) %>% 
  dplyr::select(line, Sex, Temperature, Mated, Treatment, e0, e0_scaled, h, h_scaled)

```

## Install neccessary software and build helper functions

In addition to the `R` packages we load, `plink 1.9` and `beagle` must also be installed. These software packages allow us to wrangle the data into the correct format and impute missing values, both of which are required to run GWAS with the `plink`.

`plink` is run from the terminal, but we pass the terminal command to `R` first, which then writes to the terminal. This makes our analysis reproducible. However, Windows and mac operating systems liase with the terminal differently, meaning different functions are required depending on your operating system. To make this easy we include the following code chunk, where you can specify whether you're a windows or mac user.

```{r}
#| echo: true
Operating_system <- "mac" # change this to "windows" if appropriate. Note that all downstream functions are informed by this
```

```{r}
# build functions to prepare data for GWAS

prep_for_e0_GWAS <- function(data, sex){
data %>% 
  mutate(line = glue("line{line}")) %>% 
  dplyr::select(line, e0)
}

prep_for_h_GWAS <- function(data, sex){
data %>% 
  mutate(line = glue("line{line}")) %>% 
  dplyr::select(line, h)
}

prep_for_ageing_GWAS <- function(data){
  data %>%
    mutate(line = glue("line{line}")) %>% 
    dplyr::select(line, ageing_axis_centered)
}

prep_for_baseline_mortality_GWAS <- function(data){
  data %>%
    mutate(line = glue("line{line}")) %>% 
    dplyr::select(line, baseline_mortality_axis_centered)
}

# I used bigsnpr::download_plink(dir = "code/windows") which puts it in the code folder - I'm using a windows operating system. The macOS version can also be downloaded into "code/macOS" 

# Beagle is a software package for phasing genotypes and imputing ungenotyped markers.
if(Operating_system == "mac"){plink <- paste(getwd(), "code/macOS/plink", sep = "/")}
if(Operating_system == "windows"){plink <- paste(getwd(), "code/windows/plink", sep = "/")}

# only need to download this once - change path depending on operating system
#beagle <- bigsnpr::download_beagle(
 #   dir = "/Users/tkeaney/Library/CloudStorage/OneDrive-JGU/Postdoc/DGRP_lifespan/DGRP_lifespan_inequality/code/macOS") 

# helper function to pass commands to the terminal
# Note that we set `intern = TRUE`, and pass the result of `system()` to `cat()`, 
# ensuring that the Terminal output will be printed in this knitr report.
# 
# This is the mac OS function
if(Operating_system == "mac"){
  run_command_mac <- function(shell_command, wd = getwd(), path = ""){
    cat(system(glue("cd ", wd, path, # tell terminal which directory to work in
                    "\n",shell_command), # on a second terminal line, run the plink command
               intern = TRUE), sep = '\n')  
  }
}

# This is the windows function 
if(Operating_system == "windows"){
  run_command_windows <- function(plink_command, wd = getwd(), path = "") {
    # Specify the full path to the plink executable within the 'code' subdirectory.
    plink_path <- paste(getwd(), "code/windows/plink", sep = "/")
    
    # Create the full shell command with the plink executable.
    command <- glue("cmd.exe /c cd /d {shQuote(file.path(wd, path))} && {shQuote(plink_path)} {plink_command}")
    
    # Execute the combined command.
    result <- system(command, intern = TRUE)
    
    # Print the result.
    cat(result, sep = '\n')
    
    # Return the result as a character vector.
    return(result)
  }
  
  # sometimes we need to run terminal commands without plink - for windows, create a slightly different function to do this
  
  run_command_no_plink <- function(shell_command, wd = getwd(), path = "") {
    
    # Create the full shell command with the plink executable.
    command <- glue("cmd.exe /c cd /d {shQuote(file.path(wd, path))} && {shell_command}")
    
    # Execute the combined command.
    result <- system(command, intern = TRUE)
    
    # Print the result.
    cat(result, sep = '\n')
    
    # Return the result as a character vector.
    return(result)
  }
}
```

## Perform SNP quality control and impute missing data

Plink recognises three types of files that are necessary for GWA analysis: the `.bed`, `.bim` and `.fam` files.

`.bed`: the binary biallelic genotype table. Four options are possible:

-   00 = homozygous for minor allele
-   01 = missing genotype
-   10 = heterozygous genotype
-   11 = homozygous for major allele

The overwhelming majority of genotypes in the DGRP are homozygous for one of the alleles (i.e. 00 or 11).

`.bim`: extended variant information file accompanying the `.bed` file. It has six fields:

1.  chromosome code

2.  variant identifier

3.  position in morgans

4.  base-pair coordinate

5.  Minor allele

6.  Major allele

`.fam`: Plink sample information file. It can have the following elements:

1.  Family ID ('FID') (in our case just the DGRP line)

2.  Within-family ID ('IID'; cannot be '0') (in our case just the DGRP line)

3.  Within-family ID of father ('0' if father isn't in dataset)

4.  Within-family ID of mother ('0' if mother isn't in dataset)

5.  Sex code ('1' = male, '2' = female, '0' = unknown) - not important for us because we analyse the sexes separately.

6.  Phenotype value ('1' = control, '2' = case, '-9'/'0'/non-numeric = missing data): -9 for us because we supply the phenotypic data later.

We cleaned up the DGRP’s .bed/.bim/.fam files (available from the Mackay lab [website](http://dgrp2.gnets.ncsu.edu/)) by:

1.  Removing any SNPs for which genotypes are missing in \>10% of the 205 DGRP lines. We then use the software `Beagle` to impute the remaining missing genotypes. Imputation takes about half an hour, so ideally you only want to do it once.

2.  Removing SNPs with a minor allele frequency of less than 5% across the 205 lines. We have negligible power to detect associations for rare SNPs that occur at frequencies below this threshold.

In the plink-formatted genotype files, lines fixed for the major allele are coded as 2, and lines fixed for the minor allele as 0. SNPs with negative $\beta$ coefficients therefore indicate that the minor allele is associated with higher trait values, while positive effect sizes means that the minor allele is associated with lower trait values.

```{r}
Run_function <- FALSE # Change this to TRUE to run - read through the code before you do this 

if(Run_function){
  
  # Use Plink to clean and subset the DGRP's SNP data as follows:
  # Only keep SNPs for which at least 90% of the 205 DGRP lines were successfully genotyped (--geno 0.1)
  # Only keep SNPs with a minor allele frequency of 0.05 or higher (--maf 0.05) across the 205 lines
  # Write the processed BIM/BED/FAM files to the data/Derived/plink_output directory
  
  output_directory <-  paste(getwd(), "data/Derived/plink_output", sep = "/")
  
  if(Operating_system == "windows"){
    run_command_windows(glue("--bfile dgrp2",
                             " --geno 0.1 --maf 0.05 --allow-no-sex", 
                             " --make-bed --out {shQuote(output_directory)}/dgrp2_QC_all_lines"), path = "data/Input/bfiles/")
  }
  
  if(Operating_system == "mac"){
    run_command_mac(glue("{plink} --bfile dgrp2",
                         " --geno 0.1 --maf 0.05 --allow-no-sex", 
                         " --make-bed --out ../dgrp2_QC_all_lines"), path = "/data/input/bfiles/")
  }
  # Use the shell command 'sed' to remove underscores from the DGRP line names in the .fam file (e.g. 'line_120' becomes 'line120')
  # Otherwise, these underscores cause trouble when we need to convert from PLINK to vcf format (vcf format uses underscore as a separator)
  if(Operating_system == "windows"){  
    for(i in 1:2) run_command_no_plink("sed -i '' 's/_//' dgrp2_QC_all_lines.fam", path = "/data/Derived/")
  }
  
  if(Operating_system == "mac"){            
    for(i in 1:2) run_command_mac("sed -i '' 's/_//' dgrp2_QC_all_lines.fam", path = "/data/Derived/")
  }
  # Now impute the missing genotypes using Beagle
  # This part uses the data for the full DGRP panel of >200 lines, to infer missing genotypes as accurately as possible. 
  # The bigsnpr package provides a helpful wrapper for Beagle called snp_beagleImpute(): it translates to a VCF file and back again using PLINK
  # Imputation with the below optimisation took about 25 mins on my computer, which is a high spec macbook by 2025 standards
  snp_beagleImpute(beagle, plink, 
                   bedfile.in = "data/Derived/plink_output/dgrp2_QC_all_lines.bed", 
                   bedfile.out = "data/Derived/plink_output/dgrp2_QC_all_lines_imputed.bed",
                   ncores = 10, 
                   memory.max = 32)
  
  # assign a sex of 'female' to all the DGRP lines (Beagle removes the sex, and it seems PLINK needs individuals to have a sex, 
  # despite PLINK having a command called --allow-no-sex)
  
  if(Operating_system == "windows"){ 
    run_command_windows("sed -i '' 's/    0   0   0/  0   0   2/' dgrp2_QC_all_lines_imputed.fam", 
                        path = "/data/Derived/plink_output/")
  }
  if(Operating_system == "mac"){ 
    run_command_mac("sed -i '' 's/    0   0   0/  0   0   2/' dgrp2_QC_all_lines_imputed.fam", 
                    path = "/data/Derived/plink_output/")
  }
  # Re-write the .bed file, to make sure the MAF threshold and minor/major allele designations are correctly assigned post-Beagle
  
  if(Operating_system == "windows"){ 
    run_command_windows(glue("--bfile dgrp2_QC_all_lines_imputed",
                             " --geno 0.1 --maf 0.05 --allow-no-sex", 
                             " --make-bed --out dgrp2_QC_all_lines_imputed_correct"), path = "/data/Derived/plink_output/")
  }
  
  if(Operating_system == "mac"){ 
    run_command_mac(glue("{plink} --bfile dgrp2_QC_all_lines_imputed",
                         " --geno 0.1 --maf 0.05 --allow-no-sex", 
                         " --make-bed --out dgrp2_QC_all_lines_imputed_correct"), path = "/data/Derived/plink_output/")
  }
  #unlink(list.files("data/derived", pattern = "~", full.names = TRUE)) # delete the original files, which were given a ~ file name by PLINK
}
```

## Get minor allele frequencies in the DGRP

```{r}
# Use PLINK to get the allele IDs and calculate the MAFs across the whole DGRP, for all SNPs that survived QC
  # The file created is called data/derived/plink.frq
if(!file.exists("data/Derived/plink_output/plink.frq")){
  if(Operating_system == "windows"){ 
    run_command_windows(glue("--bfile dgrp2_QC_all_lines_imputed_correct",
                             " --freq"), path = "/data/Derived/plink_output/")
  }
  if(Operating_system == "mac"){ 
    run_command_mac(glue("{plink} --bfile dgrp2_QC_all_lines_imputed_correct",
                         " --freq"), path = "/data/Derived/plink_output/")
  }
}

# Extract and save the MAFs, SNP positions, and major/minor alleles
MAFs <- 
  read.table("data/Derived/plink_output/plink.frq", header = TRUE, stringsAsFactors = FALSE) %>% 
  mutate(position = map_chr(
    strsplit(SNP, split = "_"), 
    function(x) x[2])) %>%
  dplyr::select(SNP, position, MAF, A1, A2) %>% 
  rename(minor_allele = A1,
         major_allele = A2) %>% 
  as_tibble()
```

## Create a reduced list of LD-pruned SNPs with PLINK

1,646,615 variants passed the MAF and missingness quality control. However, proximity causes strong linkage disequilibrium, such that neighbouring SNPs tend to have similar associations with the trait under consideration in GWAS. Separate genomic regions can be identified by pruning the number of SNPs within a genomic region using the `plink` arguments `--indep-pairwise 100 10 0.2`, which prunes SNPs within 100kB sliding windows, sliding 10 variants along with each step, and allows a maximum pairwise correlation ($r^2$) threshold of 0.2 between loci within the window. With these parameters, 1,419,773 variants were removed, leaving 226,842.

```{r}
# indep-pairwise arguments are: 
# 100kB window size, 
# variant count to shift the window by 10 variants at the end of each step, 
# pairwise r^2 threshold of 0.2

if(!file.exists("data/Derived/plink_output/dgrp2_QC_all_lines_imputed_correct_pruned.prune.out")) {
  
  if(Operating_system == "windows"){ 
    run_command_windows(glue("--bfile dgrp2_QC_all_lines_imputed_correct",
                             " --indep-pairwise 100 10 0.2"), path = "/data/Derived/plink_output/")
  }
  
  if(Operating_system == "mac"){ 
    run_command_mac(glue("{plink} --bfile dgrp2_QC_all_lines_imputed_correct",
                         " --indep-pairwise 100 10 0.2"), path = "/data/Derived/plink_output/")
  }
}

Genomic_regions <-
  read.table("data/Derived/plink_output/dgrp2_QC_all_lines_imputed_correct_pruned.prune.in") %>% 
  rename(SNP = V1)

```

## Build GWAS function

We use the `--assoc` flag to run a basic linear regression, fit with two data points: the mean phenotype for individuals homozygous for the minor allele and the mean phenotype for individuals homozygous for the major allele. The effect size ($\beta$) is the slope of this regression line. Negative effect sizes indicate that the minor allele is associated with a higher trait value than the major allele. The test statistic `T` and p-value are produced by a Wald test.

```{r}
run_GWAS <- function(phenotypes){
  
  # Make a list of the lines in our sample and save as a text file for passing to PLINK
  lines_to_keep <- phenotypes %>% dplyr::select(line) %>% mutate(line_2 = line)
  write.table(lines_to_keep, 
              row.names = FALSE, 
              col.names = FALSE, 
              file = "data/Derived/plink_output/lines_to_keep.txt", 
              quote = FALSE)

  # Now cull the PLINK files to just the lines that we measured, and re-apply the 
  # MAF cut-off of 0.05 for the new smaller sample of DGRP lines
  if(Operating_system == "windows"){ 
    run_command_windows(glue("--bfile dgrp2_QC_all_lines_imputed_correct",
                             " --keep-allele-order", # force PLINK to retain the major/minor allele designations that apply to the DGRP as a whole
                             " --keep lines_to_keep.txt --geno 0.1 --maf 0.05", 
                             " --make-bed --out dgrp2_QC_focal_lines"), path = "/data/Derived/plink_output/")
  }
  
  if(Operating_system == "mac"){ 
    run_command_mac(glue("{plink} --bfile dgrp2_QC_all_lines_imputed_correct",
                         " --keep-allele-order", # force PLINK to retain the major/minor allele designations that apply to the DGRP as a whole
                         " --keep lines_to_keep.txt --geno 0.1 --maf 0.05", 
                         " --make-bed --out dgrp2_QC_focal_lines"), path = "/data/Derived/plink_output/")
  }
  

  
    # Define a function to add our phenotype data to a .fam file, which is needed for GWAS analysis and to make sure PLINK includes these samples
  # The 'phenotypes' data frame needs to have a column called 'line'
  add_phenotypes_to_fam <- function(filepath, phenotypes){
    read_delim(filepath, col_names = FALSE, delim = " ") %>% 
      dplyr::select(X1, X2, X3, X4, X5) %>% # Get all the non-phenotype columns
      left_join(phenotypes, 
                by = c("X1" = "line")) %>%
      write.table(file = "data/Derived/plink_output/dgrp2_QC_focal_lines_NEW.fam", 
                  col.names = FALSE, row.names = FALSE, 
                  quote = FALSE, sep = " ")
    
    unlink("data/Derived/plink_output/dgrp2_QC_focal_lines.fam")
    file.rename("data/Derived/plink_output/dgrp2_QC_focal_lines_NEW.fam", 
                "data/Derived/plink_output/dgrp2_QC_focal_lines.fam")
  }
  
  # edit the new FAM file to add the phenotype data from 'phenotypes'
  add_phenotypes_to_fam("data/Derived/plink_output/dgrp2_QC_focal_lines.fam", phenotypes)
  
  # Run GWAS 
  if(Operating_system == "windows"){ 
    run_command_windows("--bfile dgrp2_QC_focal_lines  --assoc --maf 0.05 --allow-no-sex", 
                        path = "/data/Derived/plink_output")
  }
  
  if(Operating_system == "mac"){ 
    run_command_mac("{plink} --bfile dgrp2_QC_focal_lines  --assoc --maf 0.05 --allow-no-sex", 
                    path = "/data/Derived/plink_output")
  }
  
  # wrangle the GWAS results
  
  Focal_name <- deparse(substitute(phenotypes))
  
  gwas_results <- read.table("data/Derived/plink_output/plink.qassoc", 
                             header = TRUE) %>% 
    dplyr::select(SNP, BETA, SE, "T", P)

  # Rename and compress the GWAS summary stats file 
  # The filter step means that only variants in the LD-pruned subset get saved to disk.
  gwas_results %>% 
  #  filter(SNP %in% (pull(read_tsv("data/Derived/plink_output/dgrp2_QC_all_lines_imputed_correct_pruned.prune.in", col_names = "SNP"), SNP))) %>% 
    write_tsv(glue("data/Derived/GWAS_results/{Focal_name}.tsv.gz"))
  unlink("data/Derived/plink_output/plink.qassoc")
  
  # Rename the plink log file
  file.rename("data/Derived/plink_output/plink.log",
              glue("data/Derived/plink_output/{Focal_name}_log.txt"))
  
  unlink("data/Derived/plink_output/dgrp2_QC_focal_lines.bim")
  unlink("data/Derived/plink_output/dgrp2_QC_focal_lines.bed")
  unlink("data/Derived/plink_output/dgrp2_QC_focal_lines.fam")
  unlink("data/Derived/plink_output/dgrp2_QC_focal_lines.log")
} 
        
```

## Build manhattan plot function

```{r}
build_manhattan_plot <- function(gwas_results){
  
  manhattan_data <- gwas_results %>%
    mutate(position = str_split(SNP, "_"), # split the SNP name into logical bits
           chr = map_chr(position, ~ .x[1]), # the first bit is the chromosome arm - name the column appropriately
           position = as.numeric(map_chr(position, ~ .x[2])), # where on the chromosome is the SNP found
           pval = -1 * log10(P)) %>% # make visualising the p values easier
    dplyr::select(chr, position, pval) %>% 
    filter(chr != "4")
  
  # this next chunk finds convenient cuts for labels and colour changes 
  
  max_pos <- manhattan_data %>%
    group_by(chr) %>%
    summarise(
      max_pos = max(position), 
      middle_pos = (max_pos - min(position)) / 2,
      .groups = "drop") %>%
    as.data.frame()
  
  max_pos$max_pos <- c(0, cumsum(max_pos$max_pos[1:4]))
  
  max_pos$label_pos <- max_pos$max_pos + max_pos$middle_pos
  
  # combine the two dataframes created above
  
  manhattan_data <- manhattan_data %>%
    left_join(max_pos, by = "chr") %>%
    mutate(position = position + max_pos,
           chromosome_colour = case_when(chr == "2L" | chr == "3L" | chr == "X" ~ "A",
                                         .default = "B"),
           Notable = if_else(pval >= -log10(1e-08), "YES", "NO"))
  
  plot <- manhattan_data %>% filter(Notable == "NO") %>% 
    ggplot(aes(position, pval, group = chr, stroke = 0.01)) +
    geom_point(aes(colour = chromosome_colour), size = 1.8, alpha = 1) +
    geom_point(data =manhattan_data %>% filter(Notable == "YES"),
               aes(fill = chromosome_colour), colour = "black", shape = 21, size = 3.5, alpha = 1) +
    geom_hline(yintercept = -log10(1e-08), linetype = 2, colour = "#33271e", linewidth = 1, alpha = 0.8) +
    #geom_hline(yintercept = -log10(1e-05), linetype = 2, colour = "#33271e", linewidth = 1, alpha = 0.8) +
    scale_colour_manual(values = c(met.brewer(name = "Hokusai3")[3], met.brewer(name = "Hokusai3")[6])) +
    scale_fill_manual(values = c(met.brewer(name = "Hokusai3")[3], met.brewer(name = "Hokusai3")[6])) +
    scale_x_continuous(breaks = max_pos$label_pos, labels = max_pos$chr) +
    ylab("-log~10~(_p_)") + 
    xlab("Chromosome and position") +
    theme_classic() +
    theme(legend.position = "none",
          axis.title.y = element_markdown(size = 14),
          axis.title.x = element_markdown(size = 14),
          axis.text.x = element_text(size = 12),
          axis.text.y = element_text(size = 12))  
}
```

# Preparing for cross phenotype meta-analysis

The power to detect variants associated with genetically correlated phenotypes can be increased if a meta-analytic approach is adopted ([Zhu et al. 2018](https://journals.plos.org/plosone/article?id=10.1371/journal.pone.0193256)). Here, we used the cross-phenotype association (hereafter `CPASSOC`) approach developed by [Zhou et al. (2015)](https://www.sciencedirect.com/science/article/pii/S0002929714004777?via%3Dihub), which evaluates the null hypothesis that SNPs are not associated with any of the considered traits, weighted by the sample size of each study and adjusted for the trait genetic correlation matrix. In less language, CPASSOC evaluates the aggregated evidence of an association between a SNP and multiple phenotypes. The steps to apply `CPASSOC` are as follows:

1. Estimate $R$, the trait correlation matrix. In the DGRP, this can be done using SNP data (pruned to minimise the effect of linkage disequilibrium) or using line means.

2. GWAS each trait separately (see above).

3. Collate effect sizes for each trait into a vector $\mathbf{\beta}$ for each SNP.

4. Use a Wald test to estimate a test statistic $T_{ijk}$ for the $i^{th}$ SNP, $j^{th}$ cohort and $k^{th}$ treatment condition:

$$T_{ijk} = \frac{\hat\beta_{ijk}}{\hat{s}_{ijk}}$$ 
, where $\hat\beta_{ijk}$ and $\hat{s}_{ijk}$ are the estimated coefficient and standard error for the $i^{th}$ SNP in the $j^{th}$, for the $k^{th}$ treatment condition. From individual test statistics, a vector holding test statistics for all traits ($T$) can be built.

5. Test whether $\mathbf{\beta} = \mathbf{0}$. If the trait data are homogeneous (SNPs are expected to affect all traits in the same direction and at the same magnitude), use:

$$S_{Hom} = \frac{e^T(RW)^{-1}T(e^T(RW)^{-1}T)^T}{e^T(WRW)^{-1}e}$$
where $W$ is a diagonal matrix of weights for the individual test statistics (either the inverse of the variance or in our case the square root of the sample size of the $j^{th}$ cohort: $\sqrt{n_j}$).

6. If there is heterogeneity between trait measures (i.e. it is a reasonable expectation that genetic variants could affect some traits in one direction and others in the opposing direction), $S_{Hom}$ is not appropriate. The ideal test statistic in this case is agnostic to the sign of a genetic variant's phenoypic effect and includes only the cohorts and traits with a true contribution to the association of a genetic variant. To implement this, the absolute value $\tau$ is used as a threshold, below which traits are not included in the test statistic. To allow for for effects of different signs in different environmental contexts, let $w_{ijk} = \sqrt{n_j}\times \mathrm{sign}(T_{ijk})$. To calculate this heterogenous summary statistic first find,

$$S_{(\tau)} = \frac{e^T(R(\tau)W(\tau))^{-1}T(\tau)(e^T(R(\tau)W(\tau))^{-1}T(\tau))^T}{e^TW(\tau)^{-1}R(\tau)^{-1}W(\tau)^{-1}e}$$

This statistic, $S_{Het}$ can be viewed as the maximum of the weighted sum of trait-specific test statistics that satisfy the $\tau$ threshold.

 When $\tau$ is large, $S(\tau)$ can be undefined if the test statistic falls below $\tau$ in all contexts and cohorts. In this case $S(\tau) = 0$. Our test statistic is then

$$\displaystyle S_{Het} =  \max_{\tau \gt 0} S(\tau)$$ 
Note that the restriction imposed by $\tau$ and the sign specific $w_{ijk}$ are the only differences between $S_{Het}$ and $S_{Hom}$.

The inclusion of $\tau$ might give the impression of 'cherry picking'. However, the value of $S_{Het}$ lies in increasing power relative to univariate GWAS not by assessing if all tests return a strong, concordant association, but by identifying that more than one phenotype is strongly associated with a SNP. The more phenotypes associated, the larger $S_{Het}$ becomes.

To generate *p*-values, $S_{Het}$ is compared to a gamma distribution with a mean shift of test-statistics (see [Zhou et al. (2015)](https://www.sciencedirect.com/science/article/pii/S0002929714004777?via%3Dihub)).

Code to implement both statistics in `R` can be downloaded [here](http://hal.case.edu/~xxz10/zhu-web/), and is called below.

## The functions

These are directly loaded from [here](http://hal.case.edu/~xxz10/zhu-web/)

```{r}
require(compiler)
enableJIT(4)

Non_Trucated_TestScore <- function(X, SampleSize, CorrMatrix)
{
  Wi = matrix(SampleSize, nrow = 1);
  sumW = sqrt(sum(Wi^2));
  W = Wi / sumW;
  
  Sigma = ginv(CorrMatrix);
  XX = apply(X, 1, function(x) {
    x1 <- matrix(x, ncol = length(x), nrow = 1);
    T = W %*% Sigma %*% t(x1);
    T = (T*T) / (W %*% Sigma %*% t(W));
    return(T[1,1]);
  }
  );
  return(XX);
}
SHom <- cmpfun(Non_Trucated_TestScore);

Trucated_TestScore <- function(X, SampleSize, CorrMatrix, correct = 1, startCutoff = 0, endCutoff = 1, CutoffStep = 0.05, isAllpossible = T)
{
  N = dim(X)[2];
  
  Wi = matrix(SampleSize, nrow = 1);
  sumW = sqrt(sum(Wi^2));
  W = Wi / sumW;
  
  XX = apply(X, 1, function(x) {
    TTT = -1;
    
    if (isAllpossible == T ) {
      cutoff = sort(unique(abs(x)));      ## it will filter out any of them.
    } else {
      cutoff = seq(startCutoff, endCutoff, CutoffStep);
    }
    
    for (threshold in cutoff) {
      x1 = x;
      index = which(abs(x1) < threshold);
      
      if (length(index) == N) break;
      
      A = CorrMatrix;
      
      W1 = W;
      if (length(index) !=0 ) {
        x1 = x1[-index];
        A  = A[-index, -index];   ## update the matrix
        W1 = W[-index];
      }
      
      if (correct == 1)
      {
        index = which(x1 < 0);
        if (length(index) != 0) {
          W1[index] = -W1[index];    ## update the sign
        }
      }
      
      A = ginv(A);
      x1 = matrix(x1, nrow = 1);
      W1 = matrix(W1, nrow = 1);
      T = W1 %*% A %*% t(x1);
      T = (T*T) / (W1 %*% A %*% t(W1));
      
      if (TTT < T[1,1]) TTT = T[1,1];
    }
    return(TTT);
  }
  );
  return(XX);
}
SHet <- cmpfun(Trucated_TestScore);

EstimateGamma <- function (N = 1E6, SampleSize, CorrMatrix, correct = 1, startCutoff = 0, endCutoff = 1, CutoffStep = 0.05, isAllpossible = T) {
  
  Wi = matrix(SampleSize, nrow = 1);
  sumW = sqrt(sum(Wi^2));
  W = Wi / sumW;
  
  Permutation = mvrnorm(n = N, mu = c(rep(0, length(SampleSize))), Sigma = CorrMatrix, tol = 1e-8, empirical = F);
  
  Stat =  Trucated_TestScore(X = Permutation, SampleSize = SampleSize, CorrMatrix = CorrMatrix,
                             correct = correct, startCutoff = startCutoff, endCutoff = endCutoff,
                             CutoffStep = CutoffStep, isAllpossible = isAllpossible);
  a = min(Stat)*3/4
  ex3 = mean(Stat*Stat*Stat)
  V =   var(Stat);
  
  for (i in 1:100){
    E = mean(Stat)-a;
    k = E^2/V
    theta = V/E
    a = (-3*k*(k+1)*theta**2+sqrt(9*k**2*(k+1)**2*theta**4-12*k*theta*(k*(k+1)*(k+2)*theta**3-ex3)))/6/k/theta
  }
  
  para = c(k,theta,a);
  return(para);
}

EmpDist <- function (N = 1E6, SampleSize, CorrMatrix, correct = 1, startCutoff = 0, endCutoff = 1, CutoffStep = 0.05, isAllpossible = T) {
  
  Wi = matrix(SampleSize, nrow = 1);
  sumW = sqrt(sum(Wi^2));
  W = Wi / sumW;
  
  Permutation = mvrnorm(n = N, mu = c(rep(0, length(SampleSize))), Sigma = CorrMatrix, tol = 1e-8, empirical = F);
  
  Stat =  Trucated_TestScore(X = Permutation, SampleSize = SampleSize, CorrMatrix = CorrMatrix, correct = correct, startCutoff = startCutoff, endCutoff = endCutoff, CutoffStep = CutoffStep, isAllpossible = isAllpossible);
  
  return(Stat);
}
```

# Analysing life expectancy and lifespan equality

## Run univariate GWAS

Run GWAS for each environmental context and save the results.

```{r}
# prepare phenotype data for GWAS

Arya_f_l <- prep_for_e0_GWAS(Arya_2010_f)
Arya_m_l <- prep_for_e0_GWAS(Arya_2010_m)
Arya_f_h <- prep_for_h_GWAS(Arya_2010_f)
Arya_m_h <- prep_for_h_GWAS(Arya_2010_m)
Huang_f_18_l <- prep_for_e0_GWAS(Huang_2020_f_18)
Huang_f_18_h <- prep_for_h_GWAS(Huang_2020_f_18)
Huang_m_18_l <- prep_for_e0_GWAS(Huang_2020_m_18)
Huang_m_18_h <- prep_for_h_GWAS(Huang_2020_m_18)
Huang_f_25_l <- prep_for_e0_GWAS(Huang_2020_f_25)
Huang_f_25_h <- prep_for_h_GWAS(Huang_2020_f_25)
Huang_m_25_l <- prep_for_e0_GWAS(Huang_2020_m_25)
Huang_m_25_h <- prep_for_h_GWAS(Huang_2020_m_25)
Huang_f_28_l <- prep_for_e0_GWAS(Huang_2020_f_28)
Huang_f_28_h <- prep_for_h_GWAS(Huang_2020_f_28)
Huang_m_28_l <- prep_for_e0_GWAS(Huang_2020_m_28)
Huang_m_28_h <- prep_for_h_GWAS(Huang_2020_m_28)
Wilson_f_l_1 <- prep_for_e0_GWAS(Wilson_2020_f_1)
Wilson_f_h_1 <- prep_for_h_GWAS(Wilson_2020_f_1)
Wilson_f_l_2 <- prep_for_e0_GWAS(Wilson_2020_f_2)
Wilson_f_h_2 <- prep_for_h_GWAS(Wilson_2020_f_2)
Durham_f_l <- prep_for_e0_GWAS(Durham_2014_f)
Durham_f_h <- prep_for_h_GWAS(Durham_2014_f)
Patel_f_l <- prep_for_e0_GWAS(Patel_2021_f)
Patel_f_h <- prep_for_h_GWAS(Patel_2021_f)

# if not already done, run the GWA tests

if(!file.exists("data/Derived/GWAS_results/Arya_f_l.tsv.gz")) {
run_GWAS(Arya_f_l)
run_GWAS(Arya_m_l)
run_GWAS(Arya_f_h)
run_GWAS(Arya_m_h)
run_GWAS(Huang_f_18_l)
run_GWAS(Huang_f_18_h)
run_GWAS(Huang_m_18_l)
run_GWAS(Huang_m_18_h)
run_GWAS(Huang_f_25_l)
run_GWAS(Huang_f_25_h)
run_GWAS(Huang_m_25_l)
run_GWAS(Huang_m_25_h)
run_GWAS(Huang_f_28_l)
run_GWAS(Huang_f_28_h)
run_GWAS(Huang_m_28_l)
run_GWAS(Huang_m_28_h)
run_GWAS(Wilson_f_l_1)
run_GWAS(Wilson_f_h_1)
run_GWAS(Wilson_f_l_2)
run_GWAS(Wilson_f_h_2)
run_GWAS(Durham_f_l)
run_GWAS(Durham_f_h)
run_GWAS(Patel_f_l)
run_GWAS(Patel_f_h)
}
```

Load the results

```{r}
# load GWAS results

# Life expectancy

Arya_f_l_GWAS <- read_tsv("data/Derived/GWAS_results/Arya_f_l.tsv.gz") 
  
Huang_f_18_l_GWAS <- read_tsv("data/Derived/GWAS_results/Huang_f_18_l.tsv.gz")

Huang_f_25_l_GWAS <- read_tsv("data/Derived/GWAS_results/Huang_f_25_l.tsv.gz") 

Huang_f_28_l_GWAS <- read_tsv("data/Derived/GWAS_results/Huang_f_28_l.tsv.gz")

Wilson_f_l_1_GWAS <- read_tsv("data/Derived/GWAS_results/Wilson_f_l_1.tsv.gz") 

Wilson_f_l_2_GWAS <- read_tsv("data/Derived/GWAS_results/Wilson_f_l_2.tsv.gz") 

Durham_f_l_GWAS <- read_tsv("data/Derived/GWAS_results/Durham_f_l.tsv.gz")

Patel_f_l_GWAS <- read_tsv("data/Derived/GWAS_results/Patel_f_l.tsv.gz")

Arya_m_l_GWAS <- read_tsv("data/Derived/GWAS_results/Arya_m_l.tsv.gz")

Huang_m_18_l_GWAS <- read_tsv("data/Derived/GWAS_results/Huang_m_18_l.tsv.gz")

Huang_m_25_l_GWAS <- read_tsv("data/Derived/GWAS_results/Huang_m_25_l.tsv.gz")

Huang_m_28_l_GWAS <- read_tsv("data/Derived/GWAS_results/Huang_m_28_l.tsv.gz")

# Lifespan equality

Arya_f_h_GWAS <- read_tsv("data/Derived/GWAS_results/Arya_f_h.tsv.gz")
  
Huang_f_18_h_GWAS <- read_tsv("data/Derived/GWAS_results/Huang_f_18_h.tsv.gz")

Huang_f_25_h_GWAS <- read_tsv("data/Derived/GWAS_results/Huang_f_25_h.tsv.gz") 

Huang_f_28_h_GWAS <- read_tsv("data/Derived/GWAS_results/Huang_f_28_h.tsv.gz")

Wilson_f_h_1_GWAS <- read_tsv("data/Derived/GWAS_results/Wilson_f_h_1.tsv.gz")

Wilson_f_h_2_GWAS <- read_tsv("data/Derived/GWAS_results/Wilson_f_h_2.tsv.gz")

Durham_f_h_GWAS <- read_tsv("data/Derived/GWAS_results/Durham_f_h.tsv.gz")

Patel_f_h_GWAS <- read_tsv("data/Derived/GWAS_results/Patel_f_h.tsv.gz")

Arya_m_h_GWAS <- read_tsv("data/Derived/GWAS_results/Arya_m_h.tsv.gz")

Huang_m_18_h_GWAS <- read_tsv("data/Derived/GWAS_results/Huang_m_18_h.tsv.gz")

Huang_m_25_h_GWAS <- read_tsv("data/Derived/GWAS_results/Huang_m_25_h.tsv.gz")

Huang_m_28_h_GWAS <- read_tsv("data/Derived/GWAS_results/Huang_m_28_h.tsv.gz")

```

As a point of comparison, we find the sum of significant associations detected by univariate GWAS

**Table SX**. Genotype to phenotype associations detected by univariate GWAS, for **life expectancy**. The total row shows the number of unique candidate variants identified across all studies. \*Wilson et al phenotyped lifespan under two separate dietary conditions, which we include separately in our analysis. The number of genomic regions indicates the number of assocations found after LD pruning.

```{r}
# filter down to sig associations
e0_table <-
  bind_rows(
    tibble(`p < 1e-05 variants` = nrow(Arya_f_l_GWAS %>% filter(P < 1e-05)),
           `p < 1e-05 genomic regions` = nrow(inner_join(Genomic_regions, Arya_f_l_GWAS %>% filter(P < 1e-05))),
           `p < 1e-08 variants` = nrow(filter(Arya_f_l_GWAS, P < 1e-08)),
           `p < 1e-08 genomic regions` = nrow(inner_join(Genomic_regions, Arya_f_l_GWAS %>% filter(P < 1e-08)))) %>%
      mutate(Study = "Arya et al 2010",
             Treatment = "1",
             Sex = "Female",
             Temperature = "25",
             `Mating status` = "Virgin") %>% 
      dplyr::select(Study, Sex, Temperature,
                    `p < 1e-05 variants`, `p < 1e-05 genomic regions`, 
                    `p < 1e-08 variants`, `p < 1e-08 genomic regions`),
    
    
    tibble(`p < 1e-05 variants` = nrow(Huang_f_18_l_GWAS %>% filter(P < 1e-05)),
           `p < 1e-05 genomic regions` = nrow(inner_join(Genomic_regions, Huang_f_18_l_GWAS %>% filter(P < 1e-05))),
           `p < 1e-08 variants` = nrow(filter(Huang_f_18_l_GWAS, P < 1e-08)),
           `p < 1e-08 genomic regions` = nrow(inner_join(Genomic_regions, Huang_f_18_l_GWAS %>% filter(P < 1e-08)))) %>% 
      mutate(Study = "Huang et al 2020",
             Treatment = "1",
             Sex = "Female",
             Temperature = "18",
             `Mating status` = "Mated") %>% 
      dplyr::select(Study, Sex, Temperature,
                    `p < 1e-05 variants`, `p < 1e-05 genomic regions`, 
                    `p < 1e-08 variants`, `p < 1e-08 genomic regions`),
    
    
    tibble(`p < 1e-05 variants` = nrow(Huang_f_25_l_GWAS %>% filter(P < 1e-05)),
           `p < 1e-05 genomic regions` = nrow(inner_join(Genomic_regions, Huang_f_25_l_GWAS %>% filter(P < 1e-05))),
           `p < 1e-08 variants` = nrow(filter(Huang_f_25_l_GWAS, P < 1e-08)),
           `p < 1e-08 genomic regions` = nrow(inner_join(Genomic_regions, Huang_f_25_l_GWAS %>% filter(P < 1e-08)))) %>%
      mutate(Study = "Huang et al 2020",
             Treatment = "1",
             Sex = "Female",
             Temperature = "25",
             `Mating status` = "Mated") %>% 
      dplyr::select(Study, Sex, Temperature,
                    `p < 1e-05 variants`, `p < 1e-05 genomic regions`, 
                    `p < 1e-08 variants`, `p < 1e-08 genomic regions`),
    
    tibble(`p < 1e-05 variants` = nrow(Huang_f_28_l_GWAS %>% filter(P < 1e-05)),
           `p < 1e-05 genomic regions` = nrow(inner_join(Genomic_regions, Huang_f_28_l_GWAS %>% filter(P < 1e-05))),
           `p < 1e-08 variants` = nrow(filter(Huang_f_28_l_GWAS, P < 1e-08)),
           `p < 1e-08 genomic regions` = nrow(inner_join(Genomic_regions, Huang_f_28_l_GWAS %>% filter(P < 1e-08)))) %>%
      mutate(Study = "Huang et al 2020",
             Treatment = "1",
             Sex = "Female",
             Temperature = "28",
             `Mating status` = "Mated") %>% 
      dplyr::select(Study, Sex, Temperature,
                    `p < 1e-05 variants`, `p < 1e-05 genomic regions`, 
                    `p < 1e-08 variants`, `p < 1e-08 genomic regions`),
    
    tibble(`p < 1e-05 variants` = nrow(Wilson_f_l_1_GWAS %>% filter(P < 1e-05)),
           `p < 1e-05 genomic regions` = nrow(inner_join(Genomic_regions, Wilson_f_l_1_GWAS %>% filter(P < 1e-05))),
           `p < 1e-08 variants` = nrow(filter(Wilson_f_l_1_GWAS, P < 1e-08)),
           `p < 1e-08 genomic regions` = nrow(inner_join(Genomic_regions, Wilson_f_l_1_GWAS %>% filter(P < 1e-08)))) %>%
      mutate(Study = "Wilson et al 2020",
             Treatment = "1",
             Sex = "Female",
             Temperature = "25",
             `Mating status` = "Mated") %>% 
      dplyr::select(Study, Sex, Temperature,
                    `p < 1e-05 variants`, `p < 1e-05 genomic regions`, 
                    `p < 1e-08 variants`, `p < 1e-08 genomic regions`),
    
    tibble(`p < 1e-05 variants` = nrow(Wilson_f_l_2_GWAS %>% filter(P < 1e-05)),
           `p < 1e-05 genomic regions` = nrow(inner_join(Genomic_regions, Wilson_f_l_2_GWAS %>% filter(P < 1e-05))),
           `p < 1e-08 variants` = nrow(filter(Wilson_f_l_2_GWAS, P < 1e-08)),
           `p < 1e-08 genomic regions` = nrow(inner_join(Genomic_regions, Wilson_f_l_2_GWAS %>% filter(P < 1e-08)))) %>% 
      mutate(Study = "Wilson et al 2020*",
             Treatment = "2",
             Sex = "Female",
             Temperature = "25",
             `Mating status` = "Mated") %>% 
      dplyr::select(Study, Sex, Temperature,
                    `p < 1e-05 variants`, `p < 1e-05 genomic regions`, 
                    `p < 1e-08 variants`, `p < 1e-08 genomic regions`),
    
    tibble(`p < 1e-05 variants` = nrow(Durham_f_l_GWAS %>% filter(P < 1e-05)),
           `p < 1e-05 genomic regions` = nrow(inner_join(Genomic_regions, Durham_f_l_GWAS %>% filter(P < 1e-05))),
           `p < 1e-08 variants` = nrow(filter(Durham_f_l_GWAS, P < 1e-08)),
           `p < 1e-08 genomic regions` = nrow(inner_join(Genomic_regions, Durham_f_l_GWAS %>% filter(P < 1e-08)))) %>% 
      mutate(Study = "Durham et al 2014",
             Treatment = "1",
             Sex = "Female",
             Temperature = "25",
             `Mating status` = "Mated") %>% 
      dplyr::select(Study, Sex, Temperature,
                    `p < 1e-05 variants`, `p < 1e-05 genomic regions`, 
                    `p < 1e-08 variants`, `p < 1e-08 genomic regions`),
    
    
    tibble(`p < 1e-05 variants` = nrow(Patel_f_l_GWAS %>% filter(P < 1e-05)),
           `p < 1e-05 genomic regions` = nrow(inner_join(Genomic_regions, Patel_f_l_GWAS %>% filter(P < 1e-05))),
           `p < 1e-08 variants` = nrow(filter(Patel_f_l_GWAS, P < 1e-08)),
           `p < 1e-08 genomic regions` = nrow(inner_join(Genomic_regions, Patel_f_l_GWAS %>% filter(P < 1e-08)))) %>%
      mutate(Study = "Patel et al 2021",
             Treatment = "1",
             Sex = "Female",
             Temperature = "23",
             `Mating status` = "Mated") %>% 
      dplyr::select(Study, Sex, Temperature,
                    `p < 1e-05 variants`, `p < 1e-05 genomic regions`, 
                    `p < 1e-08 variants`, `p < 1e-08 genomic regions`),
    
    
    tibble(`p < 1e-05 variants` = nrow(Arya_m_l_GWAS %>% filter(P < 1e-05)),
           `p < 1e-05 genomic regions` = nrow(inner_join(Genomic_regions, Arya_m_l_GWAS %>% filter(P < 1e-05))),
           `p < 1e-08 variants` = nrow(filter(Arya_m_l_GWAS, P < 1e-08)),
           `p < 1e-08 genomic regions` = nrow(inner_join(Genomic_regions, Arya_m_l_GWAS %>% filter(P < 1e-08)))) %>%
      mutate(Study = "Arya et al 2010",
             Treatment = "1",
             Sex = "Male",
             Temperature = "25",
             `Mating status` = "Virgin") %>% 
      dplyr::select(Study, Sex, Temperature,
                    `p < 1e-05 variants`, `p < 1e-05 genomic regions`, 
                    `p < 1e-08 variants`, `p < 1e-08 genomic regions`),
    
    tibble(`p < 1e-05 variants` = nrow(Huang_m_18_l_GWAS %>% filter(P < 1e-05)),
           `p < 1e-05 genomic regions` = nrow(inner_join(Genomic_regions, Huang_m_18_l_GWAS %>% filter(P < 1e-05))),
           `p < 1e-08 variants` = nrow(filter(Huang_m_18_l_GWAS, P < 1e-08)),
           `p < 1e-08 genomic regions` = nrow(inner_join(Genomic_regions, Huang_m_18_l_GWAS %>% filter(P < 1e-08)))) %>%
      mutate(Study = "Huang et al 2020",
             Treatment = "1",
             Sex = "Male",
             Temperature = "18",
             `Mating status` = "Mated") %>% 
      dplyr::select(Study, Sex, Temperature,
                    `p < 1e-05 variants`, `p < 1e-05 genomic regions`, 
                    `p < 1e-08 variants`, `p < 1e-08 genomic regions`),
    
    
    tibble(`p < 1e-05 variants` = nrow(Huang_m_25_l_GWAS %>% filter(P < 1e-05)),
           `p < 1e-05 genomic regions` = nrow(inner_join(Genomic_regions, Huang_m_25_l_GWAS %>% filter(P < 1e-05))),
           `p < 1e-08 variants` = nrow(filter(Huang_m_25_l_GWAS, P < 1e-08)),
           `p < 1e-08 genomic regions` = nrow(inner_join(Genomic_regions, Huang_m_25_l_GWAS %>% filter(P < 1e-08)))) %>%
      mutate(Study = "Huang et al 2020",
             Treatment = "1",
             Sex = "Male",
             Temperature = "25",
             `Mating status` = "Mated") %>% 
      dplyr::select(Study, Sex, Temperature,
                    `p < 1e-05 variants`, `p < 1e-05 genomic regions`, 
                    `p < 1e-08 variants`, `p < 1e-08 genomic regions`),
    
    tibble(`p < 1e-05 variants` = nrow(Huang_m_28_l_GWAS %>% filter(P < 1e-05)),
           `p < 1e-05 genomic regions` = nrow(inner_join(Genomic_regions, Huang_m_28_l_GWAS %>% filter(P < 1e-05))),
           `p < 1e-08 variants` = nrow(filter(Huang_m_28_l_GWAS, P < 1e-08)),
           `p < 1e-08 genomic regions` = nrow(inner_join(Genomic_regions, Huang_m_28_l_GWAS %>% filter(P < 1e-08)))) %>%
      mutate(Study = "Huang et al 2020",
             Treatment = "1",
             Sex = "Male",
             Temperature = "28",
             `Mating status` = "Mated") %>% 
      dplyr::select(Study, Sex, Temperature,
                    `p < 1e-05 variants`, `p < 1e-05 genomic regions`, 
                    `p < 1e-08 variants`, `p < 1e-08 genomic regions`),
  ) 

# how many unique variants have been detected?
p_05_SNPs_l <-
  bind_rows(
    
    Arya_f_l_GWAS %>% 
      filter(P < 1e-05),
    
    Arya_m_l_GWAS %>% 
      filter(P < 1e-05),
    
    Huang_f_18_l_GWAS %>% 
      filter(P < 1e-05),
    
    Huang_f_25_l_GWAS %>% 
      filter(P < 1e-05),
    
    Huang_f_28_l_GWAS %>% 
      filter(P < 1e-05),
    
    Huang_m_18_l_GWAS %>% 
      filter(P < 1e-05),
    
    Huang_m_25_l_GWAS %>% 
      filter(P < 1e-05),
    
    Huang_m_28_l_GWAS %>% 
      filter(P < 1e-05),
    
    Wilson_f_l_1_GWAS %>% 
      filter(P < 1e-05),
    
    Wilson_f_l_2_GWAS %>% 
      filter(P < 1e-05),
    
    Durham_f_l_GWAS %>% 
      filter(P < 1e-05),
    
    Patel_f_l_GWAS %>% 
      filter(P < 1e-05)
  ) %>% 
  distinct(SNP) %>% 
  left_join(Genomic_regions %>% mutate(Pruned_variant = "YES")) 

e0_table %>% 
  add_row(Study = "Totals",
          Sex = "",
          Temperature = "",
          `p < 1e-05 variants` = nrow(p_05_SNPs_l),
          `p < 1e-05 genomic regions` = nrow(p_05_SNPs_l %>% filter(Pruned_variant == "YES")),
          `p < 1e-08 variants` = sum(e0_table$`p < 1e-08 variants`),
          `p < 1e-08 genomic regions` = sum(e0_table$`p < 1e-08 genomic regions`)) %>% 
  kable() %>% 
  kable_styling()
```


**Table SX**. Genotype to phenotype associations detected by univariate GWAS, for **lifespan equality**. The total row shows the number of unique candidate variants identified across all studies. \*Wilson et al phenotyped lifespan under two separate dietary conditions, which we include separately in our analysis. The number of genomic regions indicates the number of assocations found after LD pruning.

```{r}
# filter down to sig associations
h_table <-
  bind_rows(
    tibble(`p < 1e-05 variants` = nrow(Arya_f_h_GWAS %>% filter(P < 1e-05)),
           `p < 1e-05 genomic regions` = nrow(inner_join(Genomic_regions, Arya_f_h_GWAS %>% filter(P < 1e-05))),
           `p < 1e-08 variants` = nrow(filter(Arya_f_h_GWAS, P < 1e-08)),
           `p < 1e-08 genomic regions` = nrow(inner_join(Genomic_regions, Arya_f_h_GWAS %>% filter(P < 1e-08)))) %>%
      mutate(Study = "Arya et al 2010",
             Treatment = "1",
             Sex = "Female",
             Temperature = "25",
             `Mating status` = "Virgin") %>% 
      dplyr::select(Study, Sex, Temperature,
                    `p < 1e-05 variants`, `p < 1e-05 genomic regions`, 
                    `p < 1e-08 variants`, `p < 1e-08 genomic regions`),
    
    
    tibble(`p < 1e-05 variants` = nrow(Huang_f_18_h_GWAS %>% filter(P < 1e-05)),
           `p < 1e-05 genomic regions` = nrow(inner_join(Genomic_regions, Huang_f_18_h_GWAS %>% filter(P < 1e-05))),
           `p < 1e-08 variants` = nrow(filter(Huang_f_18_h_GWAS, P < 1e-08)),
           `p < 1e-08 genomic regions` = nrow(inner_join(Genomic_regions, Huang_f_18_h_GWAS %>% filter(P < 1e-08)))) %>% 
      mutate(Study = "Huang et al 2020",
             Treatment = "1",
             Sex = "Female",
             Temperature = "18",
             `Mating status` = "Mated") %>% 
      dplyr::select(Study, Sex, Temperature,
                    `p < 1e-05 variants`, `p < 1e-05 genomic regions`, 
                    `p < 1e-08 variants`, `p < 1e-08 genomic regions`),
    
    
    tibble(`p < 1e-05 variants` = nrow(Huang_f_25_h_GWAS %>% filter(P < 1e-05)),
           `p < 1e-05 genomic regions` = nrow(inner_join(Genomic_regions, Huang_f_25_h_GWAS %>% filter(P < 1e-05))),
           `p < 1e-08 variants` = nrow(filter(Huang_f_25_h_GWAS, P < 1e-08)),
           `p < 1e-08 genomic regions` = nrow(inner_join(Genomic_regions, Huang_f_25_h_GWAS %>% filter(P < 1e-08)))) %>%
      mutate(Study = "Huang et al 2020",
             Treatment = "1",
             Sex = "Female",
             Temperature = "25",
             `Mating status` = "Mated") %>% 
      dplyr::select(Study, Sex, Temperature,
                    `p < 1e-05 variants`, `p < 1e-05 genomic regions`, 
                    `p < 1e-08 variants`, `p < 1e-08 genomic regions`),
    
    tibble(`p < 1e-05 variants` = nrow(Huang_f_28_h_GWAS %>% filter(P < 1e-05)),
           `p < 1e-05 genomic regions` = nrow(inner_join(Genomic_regions, Huang_f_28_h_GWAS %>% filter(P < 1e-05))),
           `p < 1e-08 variants` = nrow(filter(Huang_f_28_h_GWAS, P < 1e-08)),
           `p < 1e-08 genomic regions` = nrow(inner_join(Genomic_regions, Huang_f_28_h_GWAS %>% filter(P < 1e-08)))) %>%
      mutate(Study = "Huang et al 2020",
             Treatment = "1",
             Sex = "Female",
             Temperature = "28",
             `Mating status` = "Mated") %>% 
      dplyr::select(Study, Sex, Temperature,
                    `p < 1e-05 variants`, `p < 1e-05 genomic regions`, 
                    `p < 1e-08 variants`, `p < 1e-08 genomic regions`),
    
    tibble(`p < 1e-05 variants` = nrow(Wilson_f_h_1_GWAS %>% filter(P < 1e-05)),
           `p < 1e-05 genomic regions` = nrow(inner_join(Genomic_regions, Wilson_f_h_1_GWAS %>% filter(P < 1e-05))),
           `p < 1e-08 variants` = nrow(filter(Wilson_f_h_1_GWAS, P < 1e-08)),
           `p < 1e-08 genomic regions` = nrow(inner_join(Genomic_regions, Wilson_f_h_1_GWAS %>% filter(P < 1e-08)))) %>%
      mutate(Study = "Wilson et al 2020",
             Treatment = "1",
             Sex = "Female",
             Temperature = "25",
             `Mating status` = "Mated") %>% 
      dplyr::select(Study, Sex, Temperature,
                    `p < 1e-05 variants`, `p < 1e-05 genomic regions`, 
                    `p < 1e-08 variants`, `p < 1e-08 genomic regions`),
    
    tibble(`p < 1e-05 variants` = nrow(Wilson_f_h_2_GWAS %>% filter(P < 1e-05)),
           `p < 1e-05 genomic regions` = nrow(inner_join(Genomic_regions, Wilson_f_h_2_GWAS %>% filter(P < 1e-05))),
           `p < 1e-08 variants` = nrow(filter(Wilson_f_h_2_GWAS, P < 1e-08)),
           `p < 1e-08 genomic regions` = nrow(inner_join(Genomic_regions, Wilson_f_h_2_GWAS %>% filter(P < 1e-08)))) %>% 
      mutate(Study = "Wilson et al 2020*",
             Treatment = "2",
             Sex = "Female",
             Temperature = "25",
             `Mating status` = "Mated") %>% 
      dplyr::select(Study, Sex, Temperature,
                    `p < 1e-05 variants`, `p < 1e-05 genomic regions`, 
                    `p < 1e-08 variants`, `p < 1e-08 genomic regions`),
    
    tibble(`p < 1e-05 variants` = nrow(Durham_f_h_GWAS %>% filter(P < 1e-05)),
           `p < 1e-05 genomic regions` = nrow(inner_join(Genomic_regions, Durham_f_h_GWAS %>% filter(P < 1e-05))),
           `p < 1e-08 variants` = nrow(filter(Durham_f_h_GWAS, P < 1e-08)),
           `p < 1e-08 genomic regions` = nrow(inner_join(Genomic_regions, Durham_f_h_GWAS %>% filter(P < 1e-08)))) %>% 
      mutate(Study = "Durham et al 2014",
             Treatment = "1",
             Sex = "Female",
             Temperature = "25",
             `Mating status` = "Mated") %>% 
      dplyr::select(Study, Sex, Temperature,
                    `p < 1e-05 variants`, `p < 1e-05 genomic regions`, 
                    `p < 1e-08 variants`, `p < 1e-08 genomic regions`),
    
    
    tibble(`p < 1e-05 variants` = nrow(Patel_f_h_GWAS %>% filter(P < 1e-05)),
           `p < 1e-05 genomic regions` = nrow(inner_join(Genomic_regions, Patel_f_h_GWAS %>% filter(P < 1e-05))),
           `p < 1e-08 variants` = nrow(filter(Patel_f_h_GWAS, P < 1e-08)),
           `p < 1e-08 genomic regions` = nrow(inner_join(Genomic_regions, Patel_f_h_GWAS %>% filter(P < 1e-08)))) %>%
      mutate(Study = "Patel et al 2021",
             Treatment = "1",
             Sex = "Female",
             Temperature = "23",
             `Mating status` = "Mated") %>% 
      dplyr::select(Study, Sex, Temperature,
                    `p < 1e-05 variants`, `p < 1e-05 genomic regions`, 
                    `p < 1e-08 variants`, `p < 1e-08 genomic regions`),
    
    
    tibble(`p < 1e-05 variants` = nrow(Arya_m_h_GWAS %>% filter(P < 1e-05)),
           `p < 1e-05 genomic regions` = nrow(inner_join(Genomic_regions, Arya_m_h_GWAS %>% filter(P < 1e-05))),
           `p < 1e-08 variants` = nrow(filter(Arya_m_h_GWAS, P < 1e-08)),
           `p < 1e-08 genomic regions` = nrow(inner_join(Genomic_regions, Arya_m_h_GWAS %>% filter(P < 1e-08)))) %>%
      mutate(Study = "Arya et al 2010",
             Treatment = "1",
             Sex = "Male",
             Temperature = "25",
             `Mating status` = "Virgin") %>% 
      dplyr::select(Study, Sex, Temperature,
                    `p < 1e-05 variants`, `p < 1e-05 genomic regions`, 
                    `p < 1e-08 variants`, `p < 1e-08 genomic regions`),
    
    tibble(`p < 1e-05 variants` = nrow(Huang_m_18_h_GWAS %>% filter(P < 1e-05)),
           `p < 1e-05 genomic regions` = nrow(inner_join(Genomic_regions, Huang_m_18_h_GWAS %>% filter(P < 1e-05))),
           `p < 1e-08 variants` = nrow(filter(Huang_m_18_h_GWAS, P < 1e-08)),
           `p < 1e-08 genomic regions` = nrow(inner_join(Genomic_regions, Huang_m_18_h_GWAS %>% filter(P < 1e-08)))) %>%
      mutate(Study = "Huang et al 2020",
             Treatment = "1",
             Sex = "Male",
             Temperature = "18",
             `Mating status` = "Mated") %>% 
      dplyr::select(Study, Sex, Temperature,
                    `p < 1e-05 variants`, `p < 1e-05 genomic regions`, 
                    `p < 1e-08 variants`, `p < 1e-08 genomic regions`),
    
    
    tibble(`p < 1e-05 variants` = nrow(Huang_m_25_h_GWAS %>% filter(P < 1e-05)),
           `p < 1e-05 genomic regions` = nrow(inner_join(Genomic_regions, Huang_m_25_h_GWAS %>% filter(P < 1e-05))),
           `p < 1e-08 variants` = nrow(filter(Huang_m_25_h_GWAS, P < 1e-08)),
           `p < 1e-08 genomic regions` = nrow(inner_join(Genomic_regions, Huang_m_25_h_GWAS %>% filter(P < 1e-08)))) %>%
      mutate(Study = "Huang et al 2020",
             Treatment = "1",
             Sex = "Male",
             Temperature = "25",
             `Mating status` = "Mated") %>% 
      dplyr::select(Study, Sex, Temperature,
                    `p < 1e-05 variants`, `p < 1e-05 genomic regions`, 
                    `p < 1e-08 variants`, `p < 1e-08 genomic regions`),
    
    tibble(`p < 1e-05 variants` = nrow(Huang_m_28_h_GWAS %>% filter(P < 1e-05)),
           `p < 1e-05 genomic regions` = nrow(inner_join(Genomic_regions, Huang_m_28_h_GWAS %>% filter(P < 1e-05))),
           `p < 1e-08 variants` = nrow(filter(Huang_m_28_h_GWAS, P < 1e-08)),
           `p < 1e-08 genomic regions` = nrow(inner_join(Genomic_regions, Huang_m_28_h_GWAS %>% filter(P < 1e-08)))) %>%
      mutate(Study = "Huang et al 2020",
             Treatment = "1",
             Sex = "Male",
             Temperature = "28",
             `Mating status` = "Mated") %>% 
      dplyr::select(Study, Sex, Temperature,
                    `p < 1e-05 variants`, `p < 1e-05 genomic regions`, 
                    `p < 1e-08 variants`, `p < 1e-08 genomic regions`),
  ) 

# how many unique variants have been detected?
p_05_SNPs_h <-
  bind_rows(
    
    Arya_f_h_GWAS %>% 
      filter(P < 1e-05),
    
    Arya_m_h_GWAS %>% 
      filter(P < 1e-05),
    
    Huang_f_18_h_GWAS %>% 
      filter(P < 1e-05),
    
    Huang_f_25_h_GWAS %>% 
      filter(P < 1e-05),
    
    Huang_f_28_h_GWAS %>% 
      filter(P < 1e-05),
    
    Huang_m_18_h_GWAS %>% 
      filter(P < 1e-05),
    
    Huang_m_25_h_GWAS %>% 
      filter(P < 1e-05),
    
    Huang_m_28_h_GWAS %>% 
      filter(P < 1e-05),
    
    Wilson_f_h_1_GWAS %>% 
      filter(P < 1e-05),
    
    Wilson_f_h_2_GWAS %>% 
      filter(P < 1e-05),
    
    Durham_f_h_GWAS %>% 
      filter(P < 1e-05),
    
    Patel_f_h_GWAS %>% 
      filter(P < 1e-05)
  ) %>% 
  distinct(SNP) %>% 
  left_join(Genomic_regions %>% mutate(Pruned_variant = "YES")) 

h_table %>% 
  add_row(Study = "Totals",
          Sex = "",
          Temperature = "",
          `p < 1e-05 variants` = nrow(p_05_SNPs_h),
          `p < 1e-05 genomic regions` = nrow(p_05_SNPs_h %>% filter(Pruned_variant == "YES")),
          `p < 1e-08 variants` = sum(h_table$`p < 1e-08 variants`),
          `p < 1e-08 genomic regions` = sum(h_table$`p < 1e-08 genomic regions`)) %>% 
  kable() %>% 
  kable_styling()
```

## Applying cross-phenotype meta-analysis

### Generate the genetic correlation matrix

We calculate the genetic correlations between traits using both the line mean and SNP effect size comparisons. Following Zhu et al. (2015), we use the SNP correlations for analysis.

```{r}
# use the BETA coefficients to build the SNP correlation matrix

SNP_beta_e0 <-
  bind_rows(
    Arya_f_l_GWAS %>% 
      mutate(Study = "Arya_2010", Sex = "Female", Temperature = 25),
    Huang_f_18_l_GWAS %>% 
      mutate(Study= "Huang_2020", Sex= "Female", Temperature= 18),
    Huang_f_25_l_GWAS %>% 
      mutate(Study= "Huang_2020", Sex= "Female", Temperature= 25),
    Huang_f_28_l_GWAS %>% 
      mutate(Study= "Huang_2020", Sex= "Female", Temperature= 28),
    Wilson_f_l_1_GWAS %>% 
      mutate(Study= "Wilson_2020_1", Sex= "Female", Temperature= 25),
    Wilson_f_l_2_GWAS %>% 
      mutate(Study= "Wilson_2020_2", Sex= "Female", Temperature= 25),
    Durham_f_l_GWAS %>% 
      mutate(Study= "Durham_2014", Sex= "Female", Temperature= 25),
    Patel_f_l_GWAS %>% 
      mutate(Study= "Patel_2021", Sex= "Female", Temperature= 23),
    Arya_m_l_GWAS %>% 
      mutate(Study= "Arya_2010", Sex= "Male", Temperature= 25),
    Huang_m_18_l_GWAS %>% 
      mutate(Study= "Huang_2020", Sex= "Male", Temperature= 18),
    Huang_m_25_l_GWAS %>% 
      mutate(Study= "Huang_2020", Sex= "Male", Temperature = 25),
    Huang_m_28_l_GWAS %>% 
      mutate(Study = "Huang_2020", Sex = "Male", Temperature = 28)) %>% 
  dplyr::select(SNP, BETA, Study, Sex, Temperature) %>% 
  pivot_wider(values_from = BETA, names_from = c(Study, Sex, Temperature)) %>% 
  rename(Arya_f_25 = Arya_2010_Female_25,
         Huang_f_18 = Huang_2020_Female_18,
         Huang_f_25 = Huang_2020_Female_25,
         Huang_f_28 = Huang_2020_Female_28,
         Wilson_f_25_1 = Wilson_2020_1_Female_25,
         Wilson_f_25_2 = Wilson_2020_2_Female_25,
         Durham_f_25 = Durham_2014_Female_25,
         Patel_f_23 = Patel_2021_Female_23,
         Arya_m_25 = Arya_2010_Male_25,
         Huang_m_18 = Huang_2020_Male_18,
         Huang_m_25 = Huang_2020_Male_25,
         Huang_m_28 = Huang_2020_Male_28)

SNP_beta_e0_LD_pruned <-
  SNP_beta_e0 %>% 
  inner_join(Genomic_regions)

SNP_beta_h <-
  bind_rows(
    Arya_f_h_GWAS %>% 
      mutate(Study = "Arya_2010", Sex = "Female", Temperature = 25),
    Huang_f_18_h_GWAS %>% 
      mutate(Study= "Huang_2020", Sex= "Female", Temperature= 18),
    Huang_f_25_h_GWAS %>% 
      mutate(Study= "Huang_2020", Sex= "Female", Temperature= 25),
    Huang_f_28_h_GWAS %>% 
      mutate(Study= "Huang_2020", Sex= "Female", Temperature= 28),
    Wilson_f_h_1_GWAS %>% 
      mutate(Study= "Wilson_2020_1", Sex= "Female", Temperature= 25),
    Wilson_f_h_2_GWAS %>% 
      mutate(Study= "Wilson_2020_2", Sex= "Female", Temperature= 25),
    Durham_f_h_GWAS %>% 
      mutate(Study= "Durham_2014", Sex= "Female", Temperature= 25),
    Patel_f_h_GWAS %>% 
      mutate(Study= "Patel_2021", Sex= "Female", Temperature= 23),
    Arya_m_h_GWAS %>% 
      mutate(Study= "Arya_2010", Sex= "Male", Temperature= 25),
    Huang_m_18_h_GWAS %>% 
      mutate(Study= "Huang_2020", Sex= "Male", Temperature= 18),
    Huang_m_25_h_GWAS %>% 
      mutate(Study= "Huang_2020", Sex= "Male", Temperature = 25),
    Huang_m_28_h_GWAS %>% 
      mutate(Study = "Huang_2020", Sex = "Male", Temperature = 28)) %>% 
  dplyr::select(SNP, BETA, Study, Sex, Temperature) %>% 
  pivot_wider(values_from = BETA, names_from = c(Study, Sex, Temperature)) %>% 
  rename(Arya_f_25 = Arya_2010_Female_25,
         Huang_f_18 = Huang_2020_Female_18,
         Huang_f_25 = Huang_2020_Female_25,
         Huang_f_28 = Huang_2020_Female_28,
         Wilson_f_25_1 = Wilson_2020_1_Female_25,
         Wilson_f_25_2 = Wilson_2020_2_Female_25,
         Durham_f_25 = Durham_2014_Female_25,
         Patel_f_23 = Patel_2021_Female_23,
         Arya_m_25 = Arya_2010_Male_25,
         Huang_m_18 = Huang_2020_Male_18,
         Huang_m_25 = Huang_2020_Male_25,
         Huang_m_28 = Huang_2020_Male_28)

SNP_beta_h_LD_pruned <-
  SNP_beta_h %>% 
  inner_join(Genomic_regions)
  

SNP_e0_corr_matrix <- cor(SNP_beta_e0_LD_pruned %>% dplyr::select(-SNP), use = "pairwise.complete.obs", method = "spearman")
SNP_h_corr_matrix <- cor(SNP_beta_h_LD_pruned %>% dplyr::select(-SNP), use = "pairwise.complete.obs", method = "spearman")


line_data <-
  bind_rows(Arya_2010_f,
            Huang_2020_f_18,
            Huang_2020_f_25,
            Huang_2020_f_28,
            Wilson_2020_f_1,
            Wilson_2020_f_2,
            Durham_2014_f,
            Patel_2021_f,
            Arya_2010_m,
            Huang_2020_m_18,
            Huang_2020_m_25,
            Huang_2020_m_28) %>% 
  dplyr::select(line, Treatment, Sex, Temperature, e0, h) %>% 
  pivot_wider(values_from = c(e0, h), names_from = c(Treatment, Sex, Temperature)) 

line_data_e0 <-
  line_data %>% 
  dplyr::select(contains("e0")) %>% 
  rename(Arya_f_25 = e0_Arya_2010_1_Female_25,
         Huang_f_18 = e0_Huang_2020_1_Female_18,
         Huang_f_25 =  e0_Huang_2020_2_Female_25,
         Huang_f_28 = e0_Huang_2020_3_Female_28,
         Wilson_f_25_1 = e0_Wilson_2020_1_Female_25,
         Wilson_f_25_2 = e0_Wilson_2020_2_Female_25,
         Durham_f_25 = e0_Durham_2014_1_Female_25,
         Patel_f_23 = e0_Patel_2021_1_Female_23,
         Arya_m_25 = e0_Arya_2010_1_Male_25,
         Huang_m_18 = e0_Huang_2020_1_Male_18,
         Huang_m_25 = e0_Huang_2020_2_Male_25,
         Huang_m_28 = e0_Huang_2020_3_Male_28)

line_data_h <-
  line_data %>% 
  dplyr::select(!contains("e0"), -line) %>% 
  rename(Arya_f_25 = h_Arya_2010_1_Female_25,
         Huang_f_18 = h_Huang_2020_1_Female_18,
         Huang_f_25 =  h_Huang_2020_2_Female_25,
         Huang_f_28 = h_Huang_2020_3_Female_28,
         Wilson_f_25_1 = h_Wilson_2020_1_Female_25,
         Wilson_f_25_2 = h_Wilson_2020_2_Female_25,
         Durham_f_25 = h_Durham_2014_1_Female_25,
         Patel_f_23 = h_Patel_2021_1_Female_23,
         Arya_m_25 = h_Arya_2010_1_Male_25,
         Huang_m_18 = h_Huang_2020_1_Male_18,
         Huang_m_25 = h_Huang_2020_2_Male_25,
         Huang_m_28 = h_Huang_2020_3_Male_28)

line_e0_corr_matrix <- cor(line_data_e0, use = "pairwise.complete.obs", method = "spearman")
line_h_corr_matrix <- cor(line_data_h, use = "pairwise.complete.obs", method = "spearman")
```

Let's visualise the genetic correlation between lifespan measures. First for life expectancy:

```{r}
breaksList <- seq(-1, 1, by = 0.02)

pheatmap(SNP_e0_corr_matrix, breaks = breaksList, 
main = "", legend_labels = c("-1", "-0.5", "0", "0.5", "Genetic correlation\n"),
color = colorRampPalette(rev(met.brewer("Benedictus", direction = 1)))(length(breaksList)),
legend = TRUE, cutree_rows = 3, cutree_cols = 3, angle_col = 45, border_color = "white")
```

Now for lifespan equality

```{r}
pheatmap(SNP_h_corr_matrix, breaks = breaksList, 
main = "", legend_labels = c("-1", "-0.5", "0", "0.5", "Genetic correlation\n"),
color = colorRampPalette(rev(met.brewer("Benedictus", direction = 1)))(length(breaksList)),
legend = TRUE, cutree_rows = 3, cutree_cols = 3, angle_col = 45, border_color = "white")
```

### Calculate meta-analytic test statistics

The purpose of this meta-analysis is to search for SNPs that have some effect on life expectancy or lifespan equality, expressed in many different contexts (sexes, temperatures and mating status').

To conduct CPASSOC for a given SNP, we need a $T$ statistic from each environmental context. A different number of lines were included in each GWAS, which caused small differences in the number of SNPs assessed in each cohort. We therefore prune the list of SNPs to those included in all univariate analyses. After pruning, 1,603,213 SNPs remain.

The Bonferroni adjusted significance threshold for this number of tests is $p_{adj} = \frac{0.05}{1603213} = 3.12\times 10^{-8}$; here and for all future analysis, we use *p* $< 10^{-8}$ as our significance threshold, as this is slightly more conservative and easier to quickly interpret.

#### Life expectancy

```{r}
Arya_f_l_T <- Arya_f_l_GWAS %>% 
  dplyr::select(SNP, T) %>% 
  rename(Arya_f = T)
  
Huang_f_18_l_T <- Huang_f_18_l_GWAS %>% 
  dplyr::select(SNP, T) %>% 
  rename(Huang_f_18 = T)

Huang_f_25_l_T <- Huang_f_25_l_GWAS %>% 
  dplyr::select(SNP, T) %>% 
  rename(Huang_f_25 = T)

Huang_f_28_l_T <- Huang_f_28_l_GWAS %>% 
  dplyr::select(SNP, T) %>% 
  rename(Huang_f_28 = T)

Wilson_f_l_1_T <- Wilson_f_l_1_GWAS %>% 
  dplyr::select(SNP, T) %>%  
  rename(Wilson_f_25_1 = T)

Wilson_f_l_2_T <- Wilson_f_l_2_GWAS %>% 
  dplyr::select(SNP, T) %>%  
  rename(Wilson_f_25_2 = T)

Durham_f_l_T <- Durham_f_l_GWAS %>% 
  dplyr::select(SNP, T) %>%  
  rename(Durham_f_25 = T)

Patel_f_l_T <- Patel_f_l_GWAS %>% 
  dplyr::select(SNP, T) %>%  
  rename(Patel_f_23 = T)

Arya_m_l_T <- Arya_m_l_GWAS %>% 
  dplyr::select(SNP, T) %>% 
  rename(Arya_m = T)

Huang_m_18_l_T <- Huang_m_18_l_GWAS %>% 
  dplyr::select(SNP, T) %>% 
  rename(Huang_m_18  = T)

Huang_m_25_l_T <- Huang_m_25_l_GWAS %>% 
  dplyr::select(SNP, T) %>% 
  rename(Huang_m_25 = T)

Huang_m_28_l_T <- Huang_m_28_l_GWAS %>% 
  dplyr::select(SNP, T) %>% 
  rename(Huang_m_28  = T)

all_e0_t_stats <-
  Arya_f_l_T %>%
  inner_join(Huang_f_18_l_T, by = "SNP") %>%
  inner_join(Huang_f_25_l_T, by = "SNP") %>%
  inner_join(Huang_f_28_l_T, by = "SNP") %>% 
  inner_join(Wilson_f_l_1_T, by = "SNP") %>% 
  inner_join(Wilson_f_l_2_T, by = "SNP") %>% 
  inner_join(Durham_f_l_T, by = "SNP") %>% 
  inner_join(Patel_f_l_T, by = "SNP") %>% 
  inner_join(Arya_m_l_T, by = "SNP") %>% 
  inner_join(Huang_m_18_l_T, by = "SNP") %>% 
  inner_join(Huang_m_25_l_T, by = "SNP") %>%
  inner_join(Huang_m_28_l_T, by = "SNP")

all_e0_t_stats_values <-
  all_e0_t_stats %>% 
  dplyr::select(2:13)

Sample_size_all <- c(165, 183, 186, 177, 161, 161, 176, 193, 165, 183, 186, 177) 

if(!file.exists("data/Derived/GWAS_results/all_e0_meta_results.csv")) {

# run the homogeneous effect meta-analysis

S_hom <- SHom(all_e0_t_stats_values, Sample_size_all, SNP_e0_corr_matrix)

# calculate meta-p-values and bind the two together with the SNP names

p_S_hom <- pchisq(S_hom, df = 1, ncp = 0, lower.tail = F) %>% 
  as_tibble() %>% 
  bind_cols(S_hom) %>% 
  rename(meta_p_hom = value, 
         S_hom = ...2)

# Calculate S_het, an extension of S_hom that improves power when the genetic effect sizes vary (potentially in sign) for different traits e.g. if a SNP has a sex or enviornment opposite effect on lifespan)

# estimate parameters of gamma distribution

para <- EstimateGamma(N = 1E4, Sample_size_all, SNP_e0_corr_matrix);

S_het <- SHet(all_e0_t_stats_values, Sample_size_all, SNP_e0_corr_matrix)

# obtain P-values of S_Het using the estimated gamma parameters
  
p_S_het <- pgamma(q = S_het-para[3], shape = para[1], scale = para[2], lower.tail = F) %>% 
  as_tibble() %>% 
  bind_cols(S_het) %>% 
  rename(meta_p_het = value, 
         S_het = ...2)

# bind meta statistics with the univariate effect sizes

all_e0_meta_results <- 
  all_e0_t_stats %>% 
  bind_cols(p_S_hom,
            p_S_het) 

write_csv(all_e0_meta_results, "data/Derived/GWAS_results/all_e0_meta_results.csv")

} else all_e0_meta_results <- read_csv("data/Derived/GWAS_results/all_e0_meta_results.csv")

```

#### Lifespan equality

```{r}
Arya_f_h_T <- Arya_f_h_GWAS %>% 
  dplyr::select(SNP, T) %>% 
  rename(Arya_f = T)
  
Huang_f_18_h_T <- Huang_f_18_h_GWAS %>% 
  dplyr::select(SNP, T) %>% 
  rename(Huang_f_18 = T)

Huang_f_25_h_T <- Huang_f_25_h_GWAS %>% 
  dplyr::select(SNP, T) %>% 
  rename(Huang_f_25 = T)

Huang_f_28_h_T <- Huang_f_28_h_GWAS %>% 
  dplyr::select(SNP, T) %>% 
  rename(Huang_f_28 = T)

Wilson_f_h_1_T <- Wilson_f_h_1_GWAS %>% 
  dplyr::select(SNP, T) %>%  
  rename(Wilson_f_25_1 = T)

Wilson_f_h_2_T <- Wilson_f_h_2_GWAS %>% 
  dplyr::select(SNP, T) %>%  
  rename(Wilson_f_25_2 = T)

Durham_f_h_T <- Durham_f_h_GWAS %>% 
  dplyr::select(SNP, T) %>%  
  rename(Durham_f_25 = T)

Patel_f_h_T <- Patel_f_h_GWAS %>% 
  dplyr::select(SNP, T) %>%  
  rename(Patel_f_23 = T)

Arya_m_h_T <- Arya_m_h_GWAS %>% 
  dplyr::select(SNP, T) %>% 
  rename(Arya_m = T)

Huang_m_18_h_T <- Huang_m_18_h_GWAS %>% 
  dplyr::select(SNP, T) %>% 
  rename(Huang_m_18  = T)

Huang_m_25_h_T <- Huang_m_25_h_GWAS %>% 
  dplyr::select(SNP, T) %>% 
  rename(Huang_m_25 = T)

Huang_m_28_h_T <- Huang_m_28_h_GWAS %>% 
  dplyr::select(SNP, T) %>% 
  rename(Huang_m_28  = T)


all_h_t_stats <-
  Arya_f_h_T %>%
  inner_join(Huang_f_18_h_T, by = "SNP") %>%
  inner_join(Huang_f_25_h_T, by = "SNP") %>%
  inner_join(Huang_f_28_h_T, by = "SNP") %>% 
  inner_join(Wilson_f_h_1_T, by = "SNP") %>%
  inner_join(Wilson_f_h_2_T, by = "SNP") %>% 
  inner_join(Durham_f_h_T, by = "SNP") %>% 
  inner_join(Patel_f_h_T, by = "SNP") %>% 
  inner_join(Arya_m_h_T, by = "SNP") %>% 
  inner_join(Huang_m_18_h_T, by = "SNP") %>% 
  inner_join(Huang_m_25_h_T, by = "SNP") %>%
  inner_join(Huang_m_28_h_T, by = "SNP") 
  

all_h_t_stats_values <-
  all_h_t_stats %>% 
  dplyr::select(2:13)

if(!file.exists("data/Derived/GWAS_results/all_h_meta_results.csv")) {

S_hom <- SHom(all_h_t_stats_values, Sample_size_all, SNP_h_corr_matrix)

# calculate meta-p-values and bind the two together with the SNP names

p_S_hom <- pchisq(S_hom, df = 1, ncp = 0, lower.tail = F) %>% 
  as_tibble() %>% 
  bind_cols(S_hom) %>% 
  rename(meta_p_hom = value, 
         S_hom = ...2)

# Calculate S_het, an extension of S_hom that improves power when the genetic effect sizes vary (potentially in sign) for different traits e.g. if a SNP has a sex or enviornment opposite effect on lifespan)

# estimate parameters of gamma distribution

para <- EstimateGamma(N = 1E4, Sample_size_all, SNP_h_corr_matrix);

S_het <- SHet(all_h_t_stats_values, Sample_size_all, SNP_h_corr_matrix)

# obtain P-values of S_Het using the estimated gamma parameters
  
p_S_het <- pgamma(q = S_het-para[3], shape = para[1], scale = para[2], lower.tail = F) %>% 
  as_tibble() %>% 
  bind_cols(S_het) %>% 
  rename(meta_p_het = value, 
         S_het = ...2)

# bind meta statistics with the univariate effect sizes

all_h_meta_results <- 
  all_h_t_stats %>% 
  bind_cols(p_S_hom,
            p_S_het)

write_csv(all_h_meta_results, "data/Derived/GWAS_results/all_h_meta_results.csv")
} else all_h_meta_results <- read_csv("data/Derived/GWAS_results/all_h_meta_results.csv")

```

## Visualise the results

We combine GWAS summary statistics calculated from lifespan data measured across different contexts. It's likely that some SNPs have G x E interactions that would lead to a heterogeneous effect across treatments. We therefore utilise the `S_het` calculated p-values.

First lets show the effect of `CPASSOC` on the number of variants found to be associated with life expectancy and lifespan equality.

**Table SX**. the number of variants associated with life expectancy and lifespan equality at various significance thresholds, estimated by univariate GWAS or CPASSOC.

```{r}
tibble(Analysis = c("CPASSOC", "Univariate", "CPASSOC", "Univariate"),
       Trait = c("Life expectancy", "Life expectancy", "Lifespan equality", "Lifespan equality"),
       `p < 1e-05 variants` = c(sum(all_e0_meta_results$meta_p_het < 1e-05),
                                nrow(p_05_SNPs_l),
                                sum(all_h_meta_results$meta_p_het < 1e-05),
                                nrow(p_05_SNPs_h)),
       `p < 1e-05 genomic regions` = c(nrow(all_e0_meta_results %>% filter(meta_p_het < 1e-05) %>% inner_join(Genomic_regions)),
                                          nrow(p_05_SNPs_l %>% filter(Pruned_variant == "YES")),
                                          nrow(all_h_meta_results %>% filter(meta_p_het < 1e-05) %>% inner_join(Genomic_regions)),
                                          nrow(p_05_SNPs_h %>% filter(Pruned_variant == "YES"))),
       `p < 1e-08 variants` = c(sum(all_e0_meta_results$meta_p_het < 1e-08),
                                sum(e0_table$`p < 1e-08 variants`),
                                sum(all_h_meta_results$meta_p_het < 1e-08),
                                sum(h_table$`p < 1e-08 variants`)),
       `p < 1e-08 genomic regions` = c(nrow(all_e0_meta_results %>% filter(meta_p_het < 1e-08) %>% inner_join(Genomic_regions)),
                                          sum(h_table$`p < 1e-08 variants`),
                                          nrow(all_h_meta_results %>% filter(meta_p_het < 1e-08) %>% inner_join(Genomic_regions)),
                                          sum(h_table$`p < 1e-08 variants`)))  %>% 
  kable() %>% 
  kable_styling()
  
```

**Table SX**. genes that encompass or are very close to the genetic variants that have strong associations with life expectancy.

```{r}
# join gene annotations with the list of analysed variants 
# note that some SNPs are associated with >1 gene, because the gene annotations overlap (I think) or the variant is close to multiple annotated genes. Others are not near any known genes, producing NAs.

life_expectancy_variants <-
  all_e0_meta_results %>%
  filter(meta_p_het < 1e-08) %>% 
  dplyr::select(SNP, S_het, meta_p_het) %>%
  left_join(annotations %>% filter(distance.to.gene <= 500)) %>% 
  mutate(meta_p_het = signif(meta_p_het*10^18, 3)/10^18,
         S_het = round(S_het, 3)) %>% 
  dplyr::select(SNP, S_het, meta_p_het, FBID, gene_name, site.class, distance.to.gene)

life_expectancy_variants %>% 
  my_data_table()

```

**Table SX**. genes that encompass or are very close to the genetic variants that have strong associations with lifespan equality.

```{r}
# join gene annotations with the list of analysed variants 
# note that some SNPs are associated with >1 gene, because the gene annotations overlap (I think) or the variant is close to multiple annotated genes. Others are not near any known genes, producing NAs.

lifespan_equality_variants <-
  all_h_meta_results %>%
  filter(meta_p_het < 1e-08) %>% 
  dplyr::select(SNP, S_het, meta_p_het) %>%
  left_join(annotations %>% filter(distance.to.gene <= 500)) %>% 
  mutate(meta_p_het = signif(meta_p_het*10^15, 3)/10^15,
         S_het = round(S_het, 3)) %>% 
  dplyr::select(SNP, S_het, meta_p_het, FBID, gene_name, site.class, distance.to.gene)

lifespan_equality_variants %>% 
  my_data_table()
```

Now build some 'Manhattan plots' to show where these significant associations can be found:

```{r, fig.width=11, eval=TRUE}
#| column: page

e0_results <- 
  all_e0_meta_results %>% 
  inner_join(Genomic_regions) %>% 
  dplyr::select(SNP, meta_p_hom, meta_p_het) %>% 
  rename(P = meta_p_het) %>% 
  mutate(logp = -log10(P))

h_results <- 
  all_h_meta_results %>% 
  dplyr::select(SNP, meta_p_hom, meta_p_het) %>% 
  inner_join(Genomic_regions) %>% 
  dplyr::select(SNP, meta_p_hom, meta_p_het) %>% 
  rename(P = meta_p_het) %>% 
  mutate(logp = -log10(P))

# plot the results using the manhattan plot custom function we defined earlier

e0_het_plot <- build_manhattan_plot(e0_results) +
  labs(title = "Life expectancy") +
  theme(plot.title = element_text(size = 20, hjust = 0.5)) +
  scale_y_continuous(limits = c(0, 21), expand = c(0, 0))

h_het_plot <- build_manhattan_plot(h_results) +
  labs(title = "Lifespan equality") +
  theme(plot.title = element_text(size = 20, hjust = 0.5)) +
  scale_y_continuous(limits = c(0, 21), expand = c(0, 0))

e0_het_plot + h_het_plot + plot_annotation(tag_levels = "A")
```

**Figure XX**. Manhattan plot showing the -Log~10~ *p*-value for each genomic region's effect on A) life expectancy and B) lifespan equality.

Plot the univariate effect sizes for each of the regions associated with life expectancy / lifespan equality at the genome-wide significance threshold (p \< $0.05^{-8}$) after CPASSOC.

**Life expectancy**

```{r, fig.height=9}

SNP_heatmap_e0 <-
  SNP_beta_e0 %>% 
  inner_join(
    all_e0_meta_results %>% 
      filter(meta_p_het < 1e-08) %>% 
      dplyr::select(SNP) %>% 
      inner_join(Genomic_regions))

row_name <- SNP_heatmap_e0$SNP

SNP_heatmap_e0 <- SNP_heatmap_e0 %>% dplyr::select(-SNP) %>% as.matrix()

rownames(SNP_heatmap_e0) <- row_name

breaksList <- seq(-7, 7, by = 0.01)

annotation_SNPs <- 
  all_e0_meta_results %>% filter(meta_p_het < 1e-08) %>% dplyr::select(SNP) %>% 
  mutate(Chromosome = case_when(str_detect(SNP, "2L") ~ "2L",
                                str_detect(SNP, "2R") ~ "2R",
                                str_detect(SNP, "3L") ~ "3L",
                                str_detect(SNP, "3R") ~ "3R",
                                str_detect(SNP, "X") ~ "X"))

annotation_studies <- 
  tibble(Study = c("Arya_f_25",
                   "Huang_f_18",
                   "Huang_f_25",
                   "Huang_f_28",
                   "Wilson_f_25_1",
                   "Wilson_f_25_2",
                   "Durham_f_25",
                   "Patel_f_23",
                   "Arya_m_25",
                   "Huang_m_18",
                   "Huang_m_25",
                   "Huang_m_28"),
         Temperature = c("25",
                         "18",
                         "25",
                         "28",
                         "25",
                         "25",
                         "25",
                         "23",
                         "25",
                         "18",
                         "25",
                         "28")) %>% 
  mutate(Sex = case_when(str_detect(Study, "_f") ~ "Female",
                         .default = "Male"),
         Mating = case_when(str_detect(Study, "Arya") ~ "NO",
                             str_detect(Study, "Huang") ~ "Throughout life",
                             str_detect(Study, "Wilson") ~ "Early life",
                             str_detect(Study, "Durham") ~ "Throughout life",
                             str_detect(Study, "Patel") ~ "Early life"),
         Author = str_extract(Study, ".*(?=\\_)"),
         Author = str_remove(Author, "_f"),
         Author = str_remove(Author, "_m"))


# create a study annotation column, need this to be a data.frame rather than a tibble for some reason 

Study_details <- annotation_studies %>%
  as.data.frame() %>% 
  dplyr::select(Study, Temperature, Mating)

my_categories <- data.frame(row.names = Study_details[, 1], Temperature = Study_details[, 2],
                            Mating = Study_details[, 3])

my_colors <- list(Temperature = c("18" = "#7bbcd5", # sailboat colours from pnw
                                  "23" = "#d0e2af",
                                  "25" = "#f5db99",
                                  "28" = "#e89c81"),
                  Mating = c("NO" = "#f8e3d1", # Shuksan from pnw
                             "Early life" = "#d7b1c5",
                             "Throughout life" = "#ac8eab"),
                  Chromosome = c("2L" = "#d8aedd", # lake colours from pnw
                                 "2R" = "#cb74ad",
                                 "3L" = "#11c2b5",
                                 "3R" = "#72e1e1",
                                 "X" = "#fbcc74"))
# create a SNP annotation column

SNP_details <- annotation_SNPs %>%
  as.data.frame()

my_SNP_categories <- data.frame(row.names = SNP_details[, 1], Chromosome = SNP_details[, 2])

my_col_names <- c("Arya et al females", "Huang et al females", "Huang et al females",
                  "Huang et al females", "Wilson et al females 1", "Wilson et al females 2", "Durham et al females",
                  "Patel et al females", "Arya et al males", "Huang et al males", "Huang et al males",
                  "Huang et al males")


pheatmap(SNP_heatmap_e0, breaks = breaksList, 
         main = "",
         color = colorRampPalette(rev(met.brewer("Benedictus", direction = 1)))(length(breaksList)),
         legend = TRUE, cutree_rows = 6, cutree_cols = 5, 
         angle_col = 45, border_color = "white",
         annotation_col = my_categories, annotation_colors = my_colors, annotation_row = my_SNP_categories,
         fontsize = 8, labels_col = my_col_names)

```

**Figure SX**. univariate effect sizes for each of the genomic regions associated with life expectancy at the genome-wide significance threshold (p \< $10^{-8}$) after CPASSOC. Effect sizes are expressed in days added to life expectancy per major allele copy. Studies are clustered by similiarity in genetic effects on the X axis, while genomic regions are clustered by similarity in effect size across studies on the Y axis.

**Lifespan equality**

```{r}
SNP_heatmap_h <-
  SNP_beta_h %>% 
  inner_join(
    all_h_meta_results %>% 
      filter(meta_p_het < 1e-08) %>% 
      dplyr::select(SNP) %>% 
      inner_join(Genomic_regions))

row_name <- SNP_heatmap_h$SNP
SNP_heatmap_h <- SNP_heatmap_h %>% dplyr::select(-SNP) %>% as.matrix()
rownames(SNP_heatmap_h) = row_name

breaksList <- seq(-0.15, 0.15, by = 0.001)

annotation_SNPs_h <- 
  all_h_meta_results %>% filter(meta_p_het < 1e-08) %>% dplyr::select(SNP) %>% 
  mutate(Chromosome = case_when(str_detect(SNP, "2L") ~ "2L",
                                str_detect(SNP, "2R") ~ "2R",
                                str_detect(SNP, "3L") ~ "3L",
                                str_detect(SNP, "3R") ~ "3R",
                                str_detect(SNP, "X") ~ "X"))


# create a SNP annotation column

SNP_details_h <- annotation_SNPs_h %>%
  as.data.frame()

my_SNP_categories_h <- data.frame(row.names = SNP_details_h[, 1], Chromosome = SNP_details_h[, 2])

pheatmap(SNP_heatmap_h, breaks = breaksList, 
         main = "",
         color = colorRampPalette(rev(met.brewer("Benedictus", direction = 1)))(length(breaksList)),
         legend = TRUE, cutree_rows = 3, cutree_cols = 4, angle_col = 45, border_color = "white",
         annotation_col = my_categories, annotation_colors = my_colors, 
         annotation_row = my_SNP_categories_h,
         fontsize = 8, labels_col = my_col_names)

```

**Figure XX**. univariate effect sizes for each of the genomic regions associated with lifespan equality at the genome-wide significance threshold (p \< $10^{-8}$) after CPASSOC. Effect sizes are expressed in **equality added** per major allele copy. Studies are clustered by similiarity in genetic effects on the X axis, while genomic regions are clustered by similarity in effect size across studies on the Y axis.

# Analysing the rate of ageing and baseline mortality

## Axes of ageing rate and baseline mortality

We've shown that orthogonal deviation from the regression of lifespan equality on life expectancy closely corresponds to the rate of ageing ($\beta$) parameter in a Gompertz-Makeham mortality model. To identify regions of the genome associated with the rate of ageing, we can calculate a rate of ageing index for each line in each treatment. To create this index, we rotate the coordinate system of the life expectancy and lifespan equality plane by $\theta$ degrees, where $\theta$ is the angle between the positive x-axis and the regression of lifespan equality on life expectancy.

**Finding the slopes**

```{r}
# create a dataframe with which mean regression lines can be predicted from each model. It spans 4 SDs in either direction.

 nd <- 
  tibble(e0 = seq(from = 0, to = 180, length.out = 180))

# fit the models

Arya_f_model <- brm(h ~ 1 + e0,
            #prior = c(prior(normal(0, 0.1), class = Intercept),
             #         prior(normal(0, 1), class = b),
              #        prior(exponential(1), class = sigma)),
            family = gaussian,
            iter = 6000, warmup = 2000,
            control = list(adapt_delta = 0.8, max_treedepth = 10),
            data = Arya_2010_f, chains = 4, cores = 4, 
            file = "data/Derived/Ageing_axis_slopes/Arya_f_slope",
            backend = "cmdstanr", stan_model_args = list(stanc_options = list("O1")),
            refresh = 400, silent = 0, seed = 1)

Arya_f_slope <-
  as_draws_df(Arya_f_model) %>% 
  as_tibble() %>% 
  dplyr::select(b_e0) %>% 
  summarise(slope = mean(b_e0)) %>% pull(slope)

 Arya_regression_line_f <-
   fitted(Arya_f_model,
        newdata = nd) %>% 
   data.frame() %>% 
   bind_cols(nd) %>% 
   dplyr::select(Estimate, e0)

Arya_m_model <- brm(h ~ 1 + e0,
            #prior = c(prior(normal(0, 0.1), class = Intercept),
             #         prior(normal(0, 1), class = b),
              #        prior(exponential(1), class = sigma)),
            family = gaussian,
            iter = 6000, warmup = 2000,
            control = list(adapt_delta = 0.8, max_treedepth = 10),
            data = Arya_2010_m, chains = 4, cores = 4, file = "data/Derived/Ageing_axis_slopes/Arya_m_slope",
            backend = "cmdstanr", stan_model_args = list(stanc_options = list("O1")),
            refresh = 400, silent = 0, seed = 1)

Arya_m_slope <-
  as_draws_df(Arya_m_model) %>% 
  as_tibble() %>% 
  dplyr::select(b_e0) %>% 
  summarise(slope = mean(b_e0)) %>% pull(slope)

 Arya_regression_line_m <-
   fitted(Arya_m_model,
        newdata = nd) %>% 
   data.frame() %>% 
   bind_cols(nd) %>% 
   dplyr::select(Estimate, e0)

Huang_f_18_model <- brm(h ~ 1 + e0,
            #prior = c(prior(normal(0, 0.1), class = Intercept),
             #         prior(normal(0, 1), class = b),
              #        prior(exponential(1), class = sigma)),
            family = gaussian,
            iter = 6000, warmup = 2000,
            control = list(adapt_delta = 0.8, max_treedepth = 10),
            data = Huang_2020_f_18, chains = 4, cores = 4, 
            file = "data/Derived/Ageing_axis_slopes/Huang_f_18_slope",
            backend = "cmdstanr", stan_model_args = list(stanc_options = list("O1")),
            refresh = 400, silent = 0, seed = 1)

Huang_f_18_slope <-
  as_draws_df(Huang_f_18_model) %>% 
  as_tibble() %>% 
  dplyr::select(b_e0) %>% 
  summarise(slope = mean(b_e0)) %>% pull(slope)

 Huang_f_18_regression_line <-
   fitted(Huang_f_18_model, newdata = nd) %>% 
   data.frame() %>% 
   bind_cols(nd) %>% 
   dplyr::select(Estimate, e0)

Huang_m_18_model <- brm(h ~ 1 + e0,
            #prior = c(prior(normal(0, 0.1), class = Intercept),
             #         prior(normal(0, 1), class = b),
              #        prior(exponential(1), class = sigma)),
            family = gaussian,
            iter = 6000, warmup = 2000,
            control = list(adapt_delta = 0.8, max_treedepth = 10),
            data = Huang_2020_m_18, chains = 4, cores = 4, 
            file = "data/Derived/Ageing_axis_slopes/Huang_m_18_slope",
            backend = "cmdstanr", stan_model_args = list(stanc_options = list("O1")),
            refresh = 400, silent = 0, seed = 1)

Huang_m_18_slope <-
  as_draws_df(Huang_m_18_model) %>% 
  as_tibble() %>% 
  dplyr::select(b_e0) %>% 
  summarise(slope = mean(b_e0)) %>% pull(slope)

 Huang_m_18_regression_line <-
   fitted(Huang_m_18_model,
        newdata = nd) %>% 
   data.frame() %>% 
   bind_cols(nd) %>% 
   dplyr::select(Estimate, e0)

Huang_f_25_model <- brm(h ~ 1 + e0,
            #prior = c(prior(normal(0, 0.1), class = Intercept),
             #         prior(normal(0, 1), class = b),
              #        prior(exponential(1), class = sigma)),
            family = gaussian,
            iter = 6000, warmup = 2000,
            control = list(adapt_delta = 0.8, max_treedepth = 10),
            data = Huang_2020_f_25, chains = 4, cores = 4, 
            file = "data/Derived/Ageing_axis_slopes/Huang_f_25_slope",
            backend = "cmdstanr", stan_model_args = list(stanc_options = list("O1")),
            refresh = 400, silent = 0, seed = 1)

Huang_f_25_slope <-
  as_draws_df(Huang_f_25_model) %>% 
  as_tibble() %>% 
  dplyr::select(b_e0) %>% 
  summarise(slope = mean(b_e0)) %>% pull(slope)

 Huang_f_25_regression_line <-
   fitted(Huang_f_25_model,
        newdata = nd) %>% 
   data.frame() %>% 
   bind_cols(nd) %>% 
   dplyr::select(Estimate, e0)

Huang_m_25_model <- brm(h ~ 1 + e0,
            #prior = c(prior(normal(0, 0.1), class = Intercept),
             #         prior(normal(0, 1), class = b),
              #        prior(exponential(1), class = sigma)),
            family = gaussian,
            iter = 6000, warmup = 2000,
            control = list(adapt_delta = 0.8, max_treedepth = 10),
            data = Huang_2020_m_25, chains = 4, cores = 4, 
            file = "data/Derived/Ageing_axis_slopes/Huang_m_25_slope",
            backend = "cmdstanr", stan_model_args = list(stanc_options = list("O1")),
            refresh = 400, silent = 0, seed = 1)

Huang_m_25_slope <-
  as_draws_df(Huang_m_25_model) %>% 
  as_tibble() %>% 
  dplyr::select(b_e0) %>% 
  summarise(slope = mean(b_e0)) %>% pull(slope)

 Huang_m_25_regression_line <-
   fitted(Huang_m_25_model,
        newdata = nd) %>% 
   data.frame() %>% 
   bind_cols(nd) %>% 
   dplyr::select(Estimate, e0)

Huang_f_28_model <- brm(h ~ 1 + e0,
            #prior = c(prior(normal(0, 0.1), class = Intercept),
             #         prior(normal(0, 1), class = b),
              #        prior(exponential(1), class = sigma)),
            family = gaussian,
            iter = 6000, warmup = 2000,
            control = list(adapt_delta = 0.8, max_treedepth = 10),
            data = Huang_2020_f_28, chains = 4, cores = 4, 
            file = "data/Derived/Ageing_axis_slopes/Huang_f_28_slope",
            backend = "cmdstanr", stan_model_args = list(stanc_options = list("O1")),
            refresh = 400, silent = 0, seed = 1)

Huang_f_28_slope <-
  as_draws_df(Huang_f_28_model) %>% 
  as_tibble() %>% 
  dplyr::select(b_e0) %>% 
  summarise(slope = mean(b_e0)) %>% pull(slope)

 Huang_f_28_regression_line <-
   fitted(Huang_f_28_model,
        newdata = nd) %>% 
   data.frame() %>% 
   bind_cols(nd) %>% 
   dplyr::select(Estimate, e0)

Huang_m_28_model <- brm(h ~ 1 + e0,
            #prior = c(prior(normal(0, 0.1), class = Intercept),
             #         prior(normal(0, 1), class = b),
              #        prior(exponential(1), class = sigma)),
            family = gaussian,
            iter = 6000, warmup = 2000,
            control = list(adapt_delta = 0.8, max_treedepth = 10),
            data = Huang_2020_m_28, chains = 4, cores = 4, 
            file = "data/Derived/Ageing_axis_slopes/Huang_m_28_slope",
            backend = "cmdstanr", stan_model_args = list(stanc_options = list("O1")),
            refresh = 400, silent = 0, seed = 1)

Huang_m_28_slope <-
  as_draws_df(Huang_m_28_model) %>% 
  as_tibble() %>% 
  dplyr::select(b_e0) %>% 
  summarise(slope = mean(b_e0)) %>% pull(slope)

 Huang_m_28_regression_line <-
   fitted(Huang_m_28_model,
        newdata = nd) %>% 
   data.frame() %>% 
   bind_cols(nd) %>% 
   dplyr::select(Estimate, e0)

Wilson_f_model_1 <- brm(h ~ 1 + e0,
            #prior = c(prior(normal(0, 0.1), class = Intercept),
             #         prior(normal(0, 1), class = b),
              #        prior(exponential(1), class = sigma)),
            family = gaussian,
            iter = 6000, warmup = 2000,
            control = list(adapt_delta = 0.8, max_treedepth = 10),
            data = Wilson_2020_f_1, chains = 4, cores = 4, 
            file = "data/Derived/Ageing_axis_slopes/Wilson_f_slope_1",
            backend = "cmdstanr", stan_model_args = list(stanc_options = list("O1")),
            refresh = 400, silent = 0, seed = 1)

Wilson_f_slope_1 <-
  as_draws_df(Wilson_f_model_1) %>% 
  as_tibble() %>% 
  dplyr::select(b_e0) %>% 
  summarise(slope = mean(b_e0)) %>% pull(slope)

 Wilson_f_regression_line_1 <-
   fitted(Wilson_f_model_1,
        newdata = nd) %>% 
   data.frame() %>% 
   bind_cols(nd) %>% 
   dplyr::select(Estimate, e0)

Wilson_f_model_2 <- brm(h ~ 1 + e0,
            #prior = c(prior(normal(0, 0.1), class = Intercept),
             #         prior(normal(0, 1), class = b),
              #        prior(exponential(1), class = sigma)),
            family = gaussian,
            iter = 6000, warmup = 2000,
            control = list(adapt_delta = 0.8, max_treedepth = 10),
            data = Wilson_2020_f_2, chains = 4, cores = 4, 
            file = "data/Derived/Ageing_axis_slopes/Wilson_f_slope_2",
            backend = "cmdstanr", stan_model_args = list(stanc_options = list("O1")),
            refresh = 400, silent = 0, seed = 1)

Wilson_f_slope_2 <-
  as_draws_df(Wilson_f_model_2) %>% 
  as_tibble() %>% 
  dplyr::select(b_e0) %>% 
  summarise(slope = mean(b_e0)) %>% pull(slope)

 Wilson_f_regression_line_2 <-
   fitted(Wilson_f_model_2,
        newdata = nd) %>% 
   data.frame() %>% 
   bind_cols(nd) %>% 
   dplyr::select(Estimate, e0)

Durham_f_model <- brm(h ~ 1 + e0,
            #prior = c(prior(normal(0, 0.1), class = Intercept),
             #         prior(normal(0, 1), class = b),
              #        prior(exponential(1), class = sigma)),
            family = gaussian,
            iter = 6000, warmup = 2000,
            control = list(adapt_delta = 0.8, max_treedepth = 10),
            data = Durham_2014_f, chains = 4, cores = 4, 
            file = "data/Derived/Ageing_axis_slopes/Durham_f_slope",
            backend = "cmdstanr", stan_model_args = list(stanc_options = list("O1")),
            refresh = 400, silent = 0, seed = 1)

Durham_f_slope <-
  as_draws_df(Durham_f_model) %>% 
  as_tibble() %>% 
  dplyr::select(b_e0) %>% 
  summarise(slope = mean(b_e0)) %>% pull(slope)

 Durham_f_regression_line <-
   fitted(Durham_f_model,
        newdata = nd) %>% 
   data.frame() %>% 
   bind_cols(nd) %>% 
   dplyr::select(Estimate, e0)


Patel_f_model <- brm(h ~ 1 + e0,
            #prior = c(prior(normal(0, 0.1), class = Intercept),
             #         prior(normal(0, 1), class = b),
              #        prior(exponential(1), class = sigma)),
            family = gaussian,
            iter = 6000, warmup = 2000,
            control = list(adapt_delta = 0.8, max_treedepth = 10),
            data = Patel_2021_f, chains = 4, cores = 4, file = "data/Derived/Ageing_axis_slopes/Patel_f_slope",
            backend = "cmdstanr", stan_model_args = list(stanc_options = list("O1")),
            refresh = 400, silent = 0, seed = 1)

Patel_f_slope <-
  as_draws_df(Patel_f_model) %>% 
  as_tibble() %>% 
  dplyr::select(b_e0) %>% 
  summarise(slope = mean(b_e0)) %>% pull(slope)
 
 Patel_regression_line <-
   fitted(Patel_f_model,
        newdata = nd) %>% 
   data.frame() %>% 
   bind_cols(nd) %>% 
   dplyr::select(Estimate, e0)

```

With regression coefficients found, we use the following formula to calculate the angle (in radians) between the mean regression line and the x-axis:

$\theta = tan^{-1}(\beta)$

where $\beta$ is the point estimate of the slope from each model posterior distribution.

```{r}
Arya_f_angle <- atan(Arya_f_slope)
Arya_m_angle <- atan(Arya_m_slope)
Huang_f_18_angle <- atan(Huang_f_18_slope)
Huang_m_18_angle <- atan(Huang_m_18_slope)
Huang_f_25_angle <- atan(Huang_f_25_slope)
Huang_m_25_angle <- atan(Huang_m_25_slope)
Huang_f_28_angle <- atan(Huang_f_28_slope)
Huang_m_28_angle <- atan(Huang_m_28_slope)
Wilson_f_1_angle <- atan(Wilson_f_slope_1)
Wilson_f_2_angle <- atan(Wilson_f_slope_2)
Durham_f_angle <- atan(Durham_f_slope)
Patel_f_angle <- atan(Patel_f_slope)
```

We then rotated the coordinate system of the life expectancy and lifespan equality plane clockwise by $\theta$:

$$x' = -(x\cos(\theta) + y\sin(\theta))$$ 

$$y' = -(x\sin(\theta) - y\cos(\theta))$$

where $x'$ and $y'$ are the vectors of genotype means for baseline mortality rate and ageing rate, and $x$ and $y$ are vectors of genotype means for life expectancy and lifespan equality. We perform this transformation on the unscaled data.

```{r}
Arya_2010_f <-
  Arya_2010_f %>% 
    mutate(baseline_mortality_axis = -1*(e0*cos(Arya_f_angle) + h*sin(Arya_f_angle)),
         ageing_axis = -1*(e0*sin(Arya_f_angle) - h*cos(Arya_f_angle)),
         baseline_mortality_axis_centered = baseline_mortality_axis - mean(baseline_mortality_axis),
         ageing_axis_centered = ageing_axis - mean(ageing_axis))

Arya_2010_m <-
  Arya_2010_m %>% 
    mutate(baseline_mortality_axis = -1*(e0*cos(Arya_m_angle) + h*sin(Arya_m_angle)),
         ageing_axis = -1*(e0*sin(Arya_m_angle) - h*cos(Arya_m_angle)),
         baseline_mortality_axis_centered = baseline_mortality_axis - mean(baseline_mortality_axis),
         ageing_axis_centered = ageing_axis - mean(ageing_axis))

Huang_2020_f_18 <-
  Huang_2020_f_18 %>% 
    mutate(baseline_mortality_axis = -1*(e0*cos(Huang_f_18_angle) + h*sin(Huang_f_18_angle)),
         ageing_axis = -1*(e0*sin(Huang_f_18_angle) - h*cos(Huang_f_18_angle)),
         baseline_mortality_axis_centered = baseline_mortality_axis - mean(baseline_mortality_axis),
         ageing_axis_centered = ageing_axis - mean(ageing_axis))

Huang_2020_m_18 <-
  Huang_2020_m_18 %>% 
    mutate(baseline_mortality_axis = -1*(e0*cos(Huang_m_18_angle) + h*sin(Huang_m_18_angle)),
         ageing_axis = -1*(e0*sin(Huang_m_18_angle) - h*cos(Huang_m_18_angle)),
         baseline_mortality_axis_centered = baseline_mortality_axis - mean(baseline_mortality_axis),
         ageing_axis_centered = ageing_axis - mean(ageing_axis))

Huang_2020_f_25 <-
  Huang_2020_f_25 %>% 
    mutate(baseline_mortality_axis = -1*(e0*cos(Huang_f_25_angle) + h*sin(Huang_f_25_angle)),
         ageing_axis = -1*(e0*sin(Huang_f_25_angle) - h*cos(Huang_f_25_angle)),
         baseline_mortality_axis_centered = baseline_mortality_axis - mean(baseline_mortality_axis),
         ageing_axis_centered = ageing_axis - mean(ageing_axis))

Huang_2020_m_25 <-
  Huang_2020_m_25 %>% 
    mutate(baseline_mortality_axis = -1*(e0*cos(Huang_m_25_angle) + h*sin(Huang_m_25_angle)),
         ageing_axis = -1*(e0*sin(Huang_m_25_angle) - h*cos(Huang_m_25_angle)),
         baseline_mortality_axis_centered = baseline_mortality_axis - mean(baseline_mortality_axis),
         ageing_axis_centered = ageing_axis - mean(ageing_axis))

Huang_2020_f_28 <-
  Huang_2020_f_28 %>% 
    mutate(baseline_mortality_axis = -1*(e0*cos(Huang_f_28_angle) + h*sin(Huang_f_28_angle)),
         ageing_axis = -1*(e0*sin(Huang_f_28_angle) - h*cos(Huang_f_28_angle)),
         baseline_mortality_axis_centered = baseline_mortality_axis - mean(baseline_mortality_axis),
         ageing_axis_centered = ageing_axis - mean(ageing_axis))

Huang_2020_m_28 <-
  Huang_2020_m_28 %>% 
    mutate(baseline_mortality_axis = -1*(e0*cos(Huang_m_28_angle) + h*sin(Huang_m_28_angle)),
         ageing_axis = -1*(e0*sin(Huang_m_28_angle) - h*cos(Huang_m_28_angle)),
         baseline_mortality_axis_centered = baseline_mortality_axis - mean(baseline_mortality_axis),
         ageing_axis_centered = ageing_axis - mean(ageing_axis))

Wilson_2020_f_1 <-
  Wilson_2020_f_1 %>% 
    mutate(baseline_mortality_axis = -1*(e0*cos(Wilson_f_1_angle) + h*sin(Wilson_f_1_angle)),
         ageing_axis = -1*(e0*sin(Wilson_f_1_angle) - h*cos(Wilson_f_1_angle)),
         baseline_mortality_axis_centered = baseline_mortality_axis - mean(baseline_mortality_axis),
         ageing_axis_centered = ageing_axis - mean(ageing_axis))

Wilson_2020_f_2 <-
  Wilson_2020_f_2 %>% 
    mutate(baseline_mortality_axis = -1*(e0*cos(Wilson_f_2_angle) + h*sin(Wilson_f_2_angle)),
         ageing_axis = -1*(e0*sin(Wilson_f_2_angle) - h*cos(Wilson_f_2_angle)),
         baseline_mortality_axis_centered = baseline_mortality_axis - mean(baseline_mortality_axis),
         ageing_axis_centered = ageing_axis - mean(ageing_axis))

Durham_2014_f <-
  Durham_2014_f %>% 
    mutate(baseline_mortality_axis = -1*(e0*cos(Durham_f_angle) + h*sin(Durham_f_angle)),
         ageing_axis = -1*(e0*sin(Durham_f_angle) - h*cos(Durham_f_angle)),
         baseline_mortality_axis_centered = baseline_mortality_axis - mean(baseline_mortality_axis),
         ageing_axis_centered = ageing_axis - mean(ageing_axis))

Patel_2021_f <- 
  Patel_2021_f %>% 
    mutate(baseline_mortality_axis = -1*(e0*cos(Patel_f_angle) + h*sin(Patel_f_angle)),
         ageing_axis = -1*(e0*sin(Patel_f_angle) - h*cos(Patel_f_angle)),
         baseline_mortality_axis_centered = baseline_mortality_axis - mean(baseline_mortality_axis),
         ageing_axis_centered = ageing_axis - mean(ageing_axis))
```

Finally, simulate curves from for the Gompertz-Makeham distribution to show the correlation between the $\alpha$ and $\beta$ parametrs and our baseline mortality and ageing rate proxies.

```{r}
# script to draw h~e0 for different gompertz b
# a sequence
a_seq <- seq(-30,2,0.02)
# b sequence
b_seq <- seq(-5,-0.5,0.5)
b_seq <- exp(b_seq)

gomp_seq <- data.frame(b=NULL,e0=NULL,h=NULL)

age_seq <- seq(0,10000,0.1)

Run_sim <- FALSE # change to TRUE to run the sim

if(Run_sim){
  
  for(i in 1:length(b_seq)){
    for (j in 1:length(a_seq)){
      lx <- exp(-exp(a_seq[j])/b_seq[i]*(exp(b_seq[i]*age_seq)-1))
      lx <- lx[lx!=0]
      if(tail(lx,1)<0.1){
        e0_gomp <- sum(lx)*0.1
        disparity <- -sum(lx*log(lx))*0.1
        h_gomp <- -log(disparity/e0_gomp)
        
      }
      gomp_seq <- rbind(gomp_seq,c(b_seq[i],e0_gomp,h_gomp))
    }
  }
  write_csv(gomp_seq, "data/Derived/gompertz_simulation.csv")
} else{
  gomp_seq <- read_csv("data/Derived/gompertz_simulation.csv")}

names(gomp_seq) <- c("b","e0","h")

gomp_seq$b <- log(gomp_seq$b)

gomp_seq$b <- as.factor(gomp_seq$b)
```

Plot the line means, coloured by their value on the ageing rate axis.

```{r, fig.height=10, fig.width=12}

rotated_axis_plot <- function(data, regression_data, which_axis, fill_title, study_title, limit){
  data %>% 
    ggplot(aes(x = e0, y = h)) +
    geom_line(data = gomp_seq,
              aes(x = e0, y = h, group = b), alpha = 0.4, linetype = 2) +
    geom_point(aes(fill = which_axis), shape = 21, size = 4) +
    scale_fill_moma_c("Avedon", direction = -1, limits = c(-1*limit, limit)) +
    geom_smooth(data = regression_data,
                aes(y = Estimate),
                stat = "identity",
                alpha = 1/2, linewidth = 1) +
    scale_x_continuous(limits = c(5, 145), 
                       breaks = c(0, 30, 60, 90, 120), expand = c(0, 0)) +
    scale_y_continuous(limits = c(0.1, 3.5), 
                       breaks = c(1, 2, 3), expand = c(0, 0)) +
    labs(fill = fill_title,
         x = "Life expectancy",
         y = "Lifespan equality",
         title = study_title) +
    theme_bw() +
    theme(plot.title = element_text(hjust = 0.5),
          panel.grid = element_blank(),
          axis.title.y = element_markdown(size = 12),
          axis.title.x = element_markdown(size = 12),
          axis.text.x = element_text(size = 10),
          axis.text.y = element_text(size = 10))
}

a <- rotated_axis_plot(Arya_2010_f, Arya_regression_line_f, which_axis = Arya_2010_f$ageing_axis_centered, 
                 "Ageing\nrate", "Arya 25C females", limit = 1.6)

a.1 <- rotated_axis_plot(Arya_2010_f, Arya_regression_line_f, 
                        which_axis = Arya_2010_f$baseline_mortality_axis_centered, 
                        "Baseline\nmortality", "Arya 25C females", limit = 60)

b <- rotated_axis_plot(Arya_2010_m, Arya_regression_line_m, which_axis = Arya_2010_m$ageing_axis_centered, 
                 "Ageing\nrate", "Arya 25C males", limit = 1.6) #+
  #coord_cartesian(xlim = c(20, 80), ylim = c(0.7, 3.3))

b.1 <- rotated_axis_plot(Arya_2010_m, Arya_regression_line_m, 
                         which_axis = Arya_2010_m$baseline_mortality_axis_centered, 
                 "Baseline\nmortality", "Arya 25C males", limit = 60) #+
  #coord_cartesian(xlim = c(20, 80), ylim = c(0.7, 3.3))

c <- rotated_axis_plot(Huang_2020_f_18, Huang_f_18_regression_line, 
                       which_axis = Huang_2020_f_18$ageing_axis_centered,
                      "Ageing\nrate", "Huang 18C females", limit = 1.6) #+
  #coord_cartesian(xlim = c(20, 135), ylim = c(0.4, 2.5))

c.1 <- rotated_axis_plot(Huang_2020_f_18, Huang_f_18_regression_line, 
                       which_axis = Huang_2020_f_18$baseline_mortality_axis_centered,
                      "Baseline\nmortality", "Huang 18C females", limit = 60) #+
  #coord_cartesian(xlim = c(20, 135), ylim = c(0.4, 2.5))

d <- rotated_axis_plot(Huang_2020_m_18, Huang_m_18_regression_line, 
                       which_axis = Huang_2020_m_18$ageing_axis_centered,
                      "Ageing\nrate", "Huang 18C males", limit = 1.6) #+
  #coord_cartesian(xlim = c(30, 140), ylim = c(0.3, 2.5))

d.1 <- rotated_axis_plot(Huang_2020_m_18, Huang_m_18_regression_line, 
                       which_axis = Huang_2020_m_18$baseline_mortality_axis_centered,
                      "Baseline\nmortality", "Huang 18C males", limit = 60) #+
  #coord_cartesian(xlim = c(30, 140), ylim = c(0.3, 2.5))

e <- rotated_axis_plot(Huang_2020_f_25, Huang_f_25_regression_line, 
                       which_axis = Huang_2020_f_25$ageing_axis_centered,
                      "Ageing\nrate", "Huang 25C females", limit = 1.6) #+
  #coord_cartesian(xlim = c(10, 70), ylim = c(0.4, 3))

e.1 <- rotated_axis_plot(Huang_2020_f_25, Huang_f_25_regression_line, 
                       which_axis = Huang_2020_f_25$baseline_mortality_axis_centered,
                      "Baseline\nmortality", "Huang 25C females", limit = 60) #+
  #coord_cartesian(xlim = c(10, 70), ylim = c(0.4, 3))

f <- rotated_axis_plot(Huang_2020_m_25, Huang_m_25_regression_line, 
                       which_axis = Huang_2020_m_25$ageing_axis_centered,
                      "Ageing\nrate", "Huang 25C males", limit = 1.6) #+
  #coord_cartesian(xlim = c(15, 80), ylim = c(0.5, 2.5))

f.1 <- rotated_axis_plot(Huang_2020_m_25, Huang_m_25_regression_line, 
                       which_axis = Huang_2020_m_25$baseline_mortality_axis_centered,
                      "Baseline\nmortality", "Huang 25C males", limit = 60) #+
  #coord_cartesian(xlim = c(15, 80), ylim = c(0.5, 2.5))

g <- rotated_axis_plot(Huang_2020_f_28, Huang_f_28_regression_line, 
                       which_axis = Huang_2020_f_28$ageing_axis_centered,
                      "Ageing\nrate", "Huang 28C females", limit = 1.6) #+
  #coord_cartesian(xlim = c(5, 45), ylim = c(0.2, 3.1))

g.1 <- rotated_axis_plot(Huang_2020_f_28, Huang_f_28_regression_line, 
                       which_axis = Huang_2020_f_28$baseline_mortality_axis_centered,
                      "Baseline\nmortality", "Huang 28C females", limit = 60) #+
  #coord_cartesian(xlim = c(5, 45), ylim = c(0.2, 3.1))

h <- rotated_axis_plot(Huang_2020_m_28, Huang_m_28_regression_line, 
                       which_axis = Huang_2020_m_28$ageing_axis_centered,
                      "Ageing\nrate", "Huang 28C males", limit = 1.6) #+
  #coord_cartesian(xlim = c(5, 45), ylim = c(0.2, 3.1))

h.1 <- rotated_axis_plot(Huang_2020_m_28, Huang_m_28_regression_line, 
                       which_axis = Huang_2020_m_28$baseline_mortality_axis_centered,
                      "Baseline\nmortality", "Huang 28C males", limit = 60) #+
  #coord_cartesian(xlim = c(5, 45), ylim = c(0.2, 3.1))

i <- rotated_axis_plot(Wilson_2020_f_1, Wilson_f_regression_line_1, 
                       which_axis = Wilson_2020_f_1$ageing_axis_centered,
                      "Ageing\nrate", "Wilson 25C females 1", limit = 1.6) #+
  #coord_cartesian(xlim = c(15, 75), ylim = c(0.4, 2.5))

i.1 <- rotated_axis_plot(Wilson_2020_f_1, Wilson_f_regression_line_1, 
                       which_axis = Wilson_2020_f_1$baseline_mortality_axis_centered,
                      "Baseline\nmortality", "Wilson 25C females 1", limit = 60) #+
  #coord_cartesian(xlim = c(15, 75), ylim = c(0.4, 2.5))

j <- rotated_axis_plot(Wilson_2020_f_2, Wilson_f_regression_line_2, 
                       which_axis = Wilson_2020_f_2$ageing_axis_centered,
                      "Ageing\nrate", "Wilson 25C females 2", limit = 1.6) #+
  #coord_cartesian(xlim = c(5, 55), ylim = c(0.1, 2.5))

j.1 <- rotated_axis_plot(Wilson_2020_f_2, Wilson_f_regression_line_2, 
                       which_axis = Wilson_2020_f_2$baseline_mortality_axis_centered,
                      "Baseline\nmortality", "Wilson 25C females 2", limit = 60) #+
  #coord_cartesian(xlim = c(5, 55), ylim = c(0.1, 2.5))

k <- rotated_axis_plot(Durham_2014_f, Durham_f_regression_line, 
                       which_axis = Durham_2014_f$ageing_axis_centered,
                      "Ageing\nrate", "Durham 25C females", limit = 1.6) #+
  #coord_cartesian(xlim = c(15, 65), ylim = c(1.1, 2.3))

k.1 <- rotated_axis_plot(Durham_2014_f, Durham_f_regression_line, 
                       which_axis = Durham_2014_f$baseline_mortality_axis_centered,
                      "Baseline\nmortality", "Durham 25C females", limit = 60) #+
  #coord_cartesian(xlim = c(15, 65), ylim = c(1.1, 2.3))

l <- rotated_axis_plot(Patel_2021_f, Patel_regression_line, 
                       which_axis = Patel_2021_f$ageing_axis_centered,
                      "Ageing\nrate", "Patel 23C females", limit = 1.6) #+
  #coord_cartesian(xlim = c(10, 75), ylim = c(0.1, 3.3))

l.1 <- rotated_axis_plot(Patel_2021_f, Patel_regression_line, 
                       which_axis = Patel_2021_f$baseline_mortality_axis_centered,
                      "Baseline\nmortality", "Patel 23C females", limit = 60) #+
  #coord_cartesian(xlim = c(10, 75), ylim = c(0.1, 3.3))
```

```{r}
(a | b | c) / (d | e | f) / (g | h| i) / (j | k | l) + #guide_area()) +
  plot_layout(guides = 'collect')

```

**Figure SX**. Points show DGRP lines, shaded by their genotypic values for the rate of ageing. Dashed curves show simulation outcomes from a Gompertz-Makeham distribution: the rate of ageing differs between curves but is fixed within them, where the baseline mortality decreases as curves progress to the right. Note that colour shows the rate of ageing relative to the mean within the treatment.

```{r}
(a.1 | b.1 | c.1) / (d.1 | e.1 | f.1) / (g.1 | h.1| i.1) / (j.1 | k.1 | l.1) + #guide_area()) +
  plot_layout(guides = 'collect')
```

**Figure SX**. As per Figure SX, except colours indicate our proxy for the baseline rate of ageing.

## Run univariate GWAS

Conduct GWAS and save the results.

```{r}
Arya_f_ageing <- prep_for_ageing_GWAS(Arya_2010_f)
Arya_m_ageing <- prep_for_ageing_GWAS(Arya_2010_m)
Huang_f_18_ageing <- prep_for_ageing_GWAS(Huang_2020_f_18)
Huang_m_18_ageing <- prep_for_ageing_GWAS(Huang_2020_m_18)
Huang_f_25_ageing <- prep_for_ageing_GWAS(Huang_2020_f_25)
Huang_m_25_ageing <- prep_for_ageing_GWAS(Huang_2020_m_25)
Huang_f_28_ageing <- prep_for_ageing_GWAS(Huang_2020_f_28)
Huang_m_28_ageing <- prep_for_ageing_GWAS(Huang_2020_m_28)
Wilson_f_ageing_1 <- prep_for_ageing_GWAS(Wilson_2020_f_1)
Wilson_f_ageing_2 <- prep_for_ageing_GWAS(Wilson_2020_f_2)
Durham_f_ageing <- prep_for_ageing_GWAS(Durham_2014_f)
Patel_f_ageing <- prep_for_ageing_GWAS(Patel_2021_f)

Arya_f_baseline_mortality <- prep_for_baseline_mortality_GWAS(Arya_2010_f)
Arya_m_baseline_mortality <- prep_for_baseline_mortality_GWAS(Arya_2010_m)
Huang_f_18_baseline_mortality <- prep_for_baseline_mortality_GWAS(Huang_2020_f_18)
Huang_m_18_baseline_mortality <- prep_for_baseline_mortality_GWAS(Huang_2020_m_18)
Huang_f_25_baseline_mortality <- prep_for_baseline_mortality_GWAS(Huang_2020_f_25)
Huang_m_25_baseline_mortality <- prep_for_baseline_mortality_GWAS(Huang_2020_m_25)
Huang_f_28_baseline_mortality <- prep_for_baseline_mortality_GWAS(Huang_2020_f_28)
Huang_m_28_baseline_mortality <- prep_for_baseline_mortality_GWAS(Huang_2020_m_28)
Wilson_f_baseline_mortality_1 <- prep_for_baseline_mortality_GWAS(Wilson_2020_f_1)
Wilson_f_baseline_mortality_2 <- prep_for_baseline_mortality_GWAS(Wilson_2020_f_2)
Durham_f_baseline_mortality <- prep_for_baseline_mortality_GWAS(Durham_2014_f)
Patel_f_baseline_mortality <- prep_for_baseline_mortality_GWAS(Patel_2021_f)

if(!file.exists("data/Derived/GWAS_results/Arya_f_ageing.tsv.gz")) {
run_GWAS(Arya_f_ageing)
run_GWAS(Arya_m_ageing)
run_GWAS(Huang_f_18_ageing)
run_GWAS(Huang_m_18_ageing)
run_GWAS(Huang_f_25_ageing)
run_GWAS(Huang_m_25_ageing)
run_GWAS(Huang_f_28_ageing)
run_GWAS(Huang_m_28_ageing)
run_GWAS(Wilson_f_ageing_1)
run_GWAS(Wilson_f_ageing_2)
run_GWAS(Durham_f_ageing)
run_GWAS(Patel_f_ageing)

run_GWAS(Arya_f_baseline_mortality)
run_GWAS(Arya_m_baseline_mortality)
run_GWAS(Huang_f_18_baseline_mortality)
run_GWAS(Huang_m_18_baseline_mortality)
run_GWAS(Huang_f_25_baseline_mortality)
run_GWAS(Huang_m_25_baseline_mortality)
run_GWAS(Huang_f_28_baseline_mortality)
run_GWAS(Huang_m_28_baseline_mortality)
run_GWAS(Wilson_f_baseline_mortality_1)
run_GWAS(Wilson_f_baseline_mortality_2)
run_GWAS(Durham_f_baseline_mortality)
run_GWAS(Patel_f_baseline_mortality)
}

Arya_f_ageing_GWAS <- read_tsv("data/Derived/GWAS_results/Arya_f_ageing.tsv.gz") 
Arya_m_ageing_GWAS <- read_tsv("data/Derived/GWAS_results/Arya_m_ageing.tsv.gz") 
Huang_f_18_ageing_GWAS <- read_tsv("data/Derived/GWAS_results/Huang_f_18_ageing.tsv.gz")
Huang_m_18_ageing_GWAS <- read_tsv("data/Derived/GWAS_results/Huang_m_18_ageing.tsv.gz")
Huang_f_25_ageing_GWAS <- read_tsv("data/Derived/GWAS_results/Huang_f_25_ageing.tsv.gz")
Huang_m_25_ageing_GWAS <- read_tsv("data/Derived/GWAS_results/Huang_m_25_ageing.tsv.gz")
Huang_f_28_ageing_GWAS <- read_tsv("data/Derived/GWAS_results/Huang_f_28_ageing.tsv.gz")
Huang_m_28_ageing_GWAS <- read_tsv("data/Derived/GWAS_results/Huang_m_28_ageing.tsv.gz")
Wilson_f_ageing_1_GWAS <- read_tsv("data/Derived/GWAS_results/Wilson_f_ageing_1.tsv.gz")
Wilson_f_ageing_2_GWAS <- read_tsv("data/Derived/GWAS_results/Wilson_f_ageing_2.tsv.gz")
Durham_f_ageing_GWAS <- read_tsv("data/Derived/GWAS_results/Durham_f_ageing.tsv.gz")
Patel_f_ageing_GWAS <- read_tsv("data/Derived/GWAS_results/Patel_f_ageing.tsv.gz")

Arya_f_baseline_mortality_GWAS <- read_tsv("data/Derived/GWAS_results/Arya_f_baseline_mortality.tsv.gz") 
Arya_m_baseline_mortality_GWAS <- read_tsv("data/Derived/GWAS_results/Arya_m_baseline_mortality.tsv.gz") 
Huang_f_18_baseline_mortality_GWAS <- read_tsv("data/Derived/GWAS_results/Huang_f_18_baseline_mortality.tsv.gz")
Huang_m_18_baseline_mortality_GWAS <- read_tsv("data/Derived/GWAS_results/Huang_m_18_baseline_mortality.tsv.gz")
Huang_f_25_baseline_mortality_GWAS <- read_tsv("data/Derived/GWAS_results/Huang_f_25_baseline_mortality.tsv.gz")
Huang_m_25_baseline_mortality_GWAS <- read_tsv("data/Derived/GWAS_results/Huang_m_25_baseline_mortality.tsv.gz")
Huang_f_28_baseline_mortality_GWAS <- read_tsv("data/Derived/GWAS_results/Huang_f_28_baseline_mortality.tsv.gz")
Huang_m_28_baseline_mortality_GWAS <- read_tsv("data/Derived/GWAS_results/Huang_m_28_baseline_mortality.tsv.gz")
Wilson_f_baseline_mortality_1_GWAS <- read_tsv("data/Derived/GWAS_results/Wilson_f_baseline_mortality_1.tsv.gz")
Wilson_f_baseline_mortality_2_GWAS <- read_tsv("data/Derived/GWAS_results/Wilson_f_baseline_mortality_2.tsv.gz")
Durham_f_baseline_mortality_GWAS <- read_tsv("data/Derived/GWAS_results/Durham_f_baseline_mortality.tsv.gz")
Patel_f_baseline_mortality_GWAS <- read_tsv("data/Derived/GWAS_results/Patel_f_baseline_mortality.tsv.gz")

```

**Table SX**. Genotype to phenotype associations detected by univariate GWAS, for the **rate of ageing**. The number of genomic regions indicates the number of genetic variants associated with the rate of ageing after LD pruning. The total row shows the number of unique candidate variants identified across all studies. \*Wilson et al. phenotyped lifespan under two separate dietary conditions, which we include separately in our analysis.

```{r}
# filter down to sig associations
ageing_table <-
  bind_rows(
    tibble(`p < 1e-05 variants` = nrow(Arya_f_ageing_GWAS %>% filter(P < 1e-05)),
           `p < 1e-05 genomic regions` = nrow(inner_join(Genomic_regions, 
                                                            Arya_f_ageing_GWAS %>% filter(P < 1e-05))),
           `p < 1e-08 variants` = nrow(filter(Arya_f_ageing_GWAS, P < 1e-08)),
           `p < 1e-08 genomic regions` = nrow(inner_join(Genomic_regions, 
                                                            Arya_f_ageing_GWAS %>% filter(P < 1e-08)))) %>%
      mutate(Study = "Arya et al 2010",
             Treatment = "1",
             Sex = "Female",
             Temperature = "25",
             `Mating status` = "Virgin") %>% 
      dplyr::select(Study, Sex, Temperature,
                    `p < 1e-05 variants`, `p < 1e-05 genomic regions`, 
                    `p < 1e-08 variants`, `p < 1e-08 genomic regions`),
    
    
    tibble(`p < 1e-05 variants` = nrow(Huang_f_18_ageing_GWAS %>% filter(P < 1e-05)),
           `p < 1e-05 genomic regions` = nrow(inner_join(Genomic_regions, 
                                                            Huang_f_18_ageing_GWAS %>% filter(P < 1e-05))),
           `p < 1e-08 variants` = nrow(filter(Huang_f_18_ageing_GWAS, P < 1e-08)),
           `p < 1e-08 genomic regions` = nrow(inner_join(Genomic_regions, 
                                                            Huang_f_18_ageing_GWAS %>% filter(P < 1e-08)))) %>% 
      mutate(Study = "Huang et al 2020",
             Treatment = "1",
             Sex = "Female",
             Temperature = "18",
             `Mating status` = "Mated") %>% 
      dplyr::select(Study, Sex, Temperature,
                    `p < 1e-05 variants`, `p < 1e-05 genomic regions`, 
                    `p < 1e-08 variants`, `p < 1e-08 genomic regions`),
    
    
    tibble(`p < 1e-05 variants` = nrow(Huang_f_25_ageing_GWAS %>% filter(P < 1e-05)),
           `p < 1e-05 genomic regions` = nrow(inner_join(Genomic_regions, 
                                                            Huang_f_25_ageing_GWAS %>% filter(P < 1e-05))),
           `p < 1e-08 variants` = nrow(filter(Huang_f_25_ageing_GWAS, P < 1e-08)),
           `p < 1e-08 genomic regions` = nrow(inner_join(Genomic_regions, 
                                                            Huang_f_25_ageing_GWAS %>% filter(P < 1e-08)))) %>%
      mutate(Study = "Huang et al 2020",
             Treatment = "1",
             Sex = "Female",
             Temperature = "25",
             `Mating status` = "Mated") %>% 
      dplyr::select(Study, Sex, Temperature,
                    `p < 1e-05 variants`, `p < 1e-05 genomic regions`, 
                    `p < 1e-08 variants`, `p < 1e-08 genomic regions`),
    
    tibble(`p < 1e-05 variants` = nrow(Huang_f_28_ageing_GWAS %>% filter(P < 1e-05)),
           `p < 1e-05 genomic regions` = nrow(inner_join(Genomic_regions, 
                                                            Huang_f_28_ageing_GWAS %>% filter(P < 1e-05))),
           `p < 1e-08 variants` = nrow(filter(Huang_f_28_ageing_GWAS, P < 1e-08)),
           `p < 1e-08 genomic regions` = nrow(inner_join(Genomic_regions, 
                                                            Huang_f_28_ageing_GWAS %>% filter(P < 1e-08)))) %>%
      mutate(Study = "Huang et al 2020",
             Treatment = "1",
             Sex = "Female",
             Temperature = "28",
             `Mating status` = "Mated") %>% 
      dplyr::select(Study, Sex, Temperature,
                    `p < 1e-05 variants`, `p < 1e-05 genomic regions`, 
                    `p < 1e-08 variants`, `p < 1e-08 genomic regions`),
    
    tibble(`p < 1e-05 variants` = nrow(Wilson_f_ageing_1_GWAS %>% filter(P < 1e-05)),
           `p < 1e-05 genomic regions` = nrow(inner_join(Genomic_regions, 
                                                            Wilson_f_ageing_1_GWAS %>% filter(P < 1e-05))),
           `p < 1e-08 variants` = nrow(filter(Wilson_f_ageing_1_GWAS, P < 1e-08)),
           `p < 1e-08 genomic regions` = nrow(inner_join(Genomic_regions, 
                                                            Wilson_f_ageing_1_GWAS %>% filter(P < 1e-08)))) %>%
      mutate(Study = "Wilson et al 2020",
             Treatment = "1",
             Sex = "Female",
             Temperature = "25",
             `Mating status` = "Mated") %>% 
      dplyr::select(Study, Sex, Temperature,
                    `p < 1e-05 variants`, `p < 1e-05 genomic regions`, 
                    `p < 1e-08 variants`, `p < 1e-08 genomic regions`),
    
    tibble(`p < 1e-05 variants` = nrow(Wilson_f_ageing_2_GWAS %>% filter(P < 1e-05)),
           `p < 1e-05 genomic regions` = nrow(inner_join(Genomic_regions, 
                                                            Wilson_f_ageing_2_GWAS %>% filter(P < 1e-05))),
           `p < 1e-08 variants` = nrow(filter(Wilson_f_ageing_2_GWAS, P < 1e-08)),
           `p < 1e-08 genomic regions` = nrow(inner_join(Genomic_regions, 
                                                            Wilson_f_ageing_2_GWAS %>% filter(P < 1e-08)))) %>% 
      mutate(Study = "Wilson et al 2020*",
             Treatment = "2",
             Sex = "Female",
             Temperature = "25",
             `Mating status` = "Mated") %>% 
      dplyr::select(Study, Sex, Temperature,
                    `p < 1e-05 variants`, `p < 1e-05 genomic regions`, 
                    `p < 1e-08 variants`, `p < 1e-08 genomic regions`),
    
    tibble(`p < 1e-05 variants` = nrow(Durham_f_ageing_GWAS %>% filter(P < 1e-05)),
           `p < 1e-05 genomic regions` = nrow(inner_join(Genomic_regions, 
                                                            Durham_f_ageing_GWAS %>% filter(P < 1e-05))),
           `p < 1e-08 variants` = nrow(filter(Durham_f_ageing_GWAS, P < 1e-08)),
           `p < 1e-08 genomic regions` = nrow(inner_join(Genomic_regions, 
                                                            Durham_f_ageing_GWAS %>% filter(P < 1e-08)))) %>% 
      mutate(Study = "Durham et al 2014",
             Treatment = "1",
             Sex = "Female",
             Temperature = "25",
             `Mating status` = "Mated") %>% 
      dplyr::select(Study, Sex, Temperature,
                    `p < 1e-05 variants`, `p < 1e-05 genomic regions`, 
                    `p < 1e-08 variants`, `p < 1e-08 genomic regions`),
    
    
    tibble(`p < 1e-05 variants` = nrow(Patel_f_ageing_GWAS %>% filter(P < 1e-05)),
           `p < 1e-05 genomic regions` = nrow(inner_join(Genomic_regions, 
                                                            Patel_f_ageing_GWAS %>% filter(P < 1e-05))),
           `p < 1e-08 variants` = nrow(filter(Patel_f_ageing_GWAS, P < 1e-08)),
           `p < 1e-08 genomic regions` = nrow(inner_join(Genomic_regions, 
                                                            Patel_f_ageing_GWAS %>% filter(P < 1e-08)))) %>%
      mutate(Study = "Patel et al 2021",
             Treatment = "1",
             Sex = "Female",
             Temperature = "23",
             `Mating status` = "Mated") %>% 
      dplyr::select(Study, Sex, Temperature,
                    `p < 1e-05 variants`, `p < 1e-05 genomic regions`, 
                    `p < 1e-08 variants`, `p < 1e-08 genomic regions`),
    
    
    tibble(`p < 1e-05 variants` = nrow(Arya_m_ageing_GWAS %>% filter(P < 1e-05)),
           `p < 1e-05 genomic regions` = nrow(inner_join(Genomic_regions, 
                                                            Arya_m_ageing_GWAS %>% filter(P < 1e-05))),
           `p < 1e-08 variants` = nrow(filter(Arya_m_ageing_GWAS, P < 1e-08)),
           `p < 1e-08 genomic regions` = nrow(inner_join(Genomic_regions, 
                                                            Arya_m_ageing_GWAS %>% filter(P < 1e-08)))) %>%
      mutate(Study = "Arya et al 2010",
             Treatment = "1",
             Sex = "Male",
             Temperature = "25",
             `Mating status` = "Virgin") %>% 
      dplyr::select(Study, Sex, Temperature,
                    `p < 1e-05 variants`, `p < 1e-05 genomic regions`, 
                    `p < 1e-08 variants`, `p < 1e-08 genomic regions`),
    
    tibble(`p < 1e-05 variants` = nrow(Huang_m_18_ageing_GWAS %>% filter(P < 1e-05)),
           `p < 1e-05 genomic regions` = nrow(inner_join(Genomic_regions, 
                                                            Huang_m_18_ageing_GWAS %>% filter(P < 1e-05))),
           `p < 1e-08 variants` = nrow(filter(Huang_m_18_ageing_GWAS, P < 1e-08)),
           `p < 1e-08 genomic regions` = nrow(inner_join(Genomic_regions, 
                                                            Huang_m_18_ageing_GWAS %>% filter(P < 1e-08)))) %>%
      mutate(Study = "Huang et al 2020",
             Treatment = "1",
             Sex = "Male",
             Temperature = "18",
             `Mating status` = "Mated") %>% 
      dplyr::select(Study, Sex, Temperature,
                    `p < 1e-05 variants`, `p < 1e-05 genomic regions`, 
                    `p < 1e-08 variants`, `p < 1e-08 genomic regions`),
    
    
    tibble(`p < 1e-05 variants` = nrow(Huang_m_25_ageing_GWAS %>% filter(P < 1e-05)),
           `p < 1e-05 genomic regions` = nrow(inner_join(Genomic_regions, 
                                                            Huang_m_25_ageing_GWAS %>% filter(P < 1e-05))),
           `p < 1e-08 variants` = nrow(filter(Huang_m_25_ageing_GWAS, P < 1e-08)),
           `p < 1e-08 genomic regions` = nrow(inner_join(Genomic_regions, 
                                                            Huang_m_25_ageing_GWAS %>% filter(P < 1e-08)))) %>%
      mutate(Study = "Huang et al 2020",
             Treatment = "1",
             Sex = "Male",
             Temperature = "25",
             `Mating status` = "Mated") %>% 
      dplyr::select(Study, Sex, Temperature,
                    `p < 1e-05 variants`, `p < 1e-05 genomic regions`, 
                    `p < 1e-08 variants`, `p < 1e-08 genomic regions`),
    
    tibble(`p < 1e-05 variants` = nrow(Huang_m_28_ageing_GWAS %>% filter(P < 1e-05)),
           `p < 1e-05 genomic regions` = nrow(inner_join(Genomic_regions, 
                                                            Huang_m_28_ageing_GWAS %>% filter(P < 1e-05))),
           `p < 1e-08 variants` = nrow(filter(Huang_m_28_ageing_GWAS, P < 1e-08)),
           `p < 1e-08 genomic regions` = nrow(inner_join(Genomic_regions, 
                                                            Huang_m_28_ageing_GWAS %>% filter(P < 1e-08)))) %>%
      mutate(Study = "Huang et al 2020",
             Treatment = "1",
             Sex = "Male",
             Temperature = "28",
             `Mating status` = "Mated") %>% 
      dplyr::select(Study, Sex, Temperature,
                    `p < 1e-05 variants`, `p < 1e-05 genomic regions`, 
                    `p < 1e-08 variants`, `p < 1e-08 genomic regions`),
  ) 

# how many unique variants have been detected?
ageing_p_05_SNPs <-
  bind_rows(
    
    Arya_f_ageing_GWAS %>% 
      filter(P < 1e-05),
    
    Arya_m_ageing_GWAS %>% 
      filter(P < 1e-05),
    
    Huang_f_18_ageing_GWAS %>% 
      filter(P < 1e-05),
    
    Huang_f_25_ageing_GWAS %>% 
      filter(P < 1e-05),
    
    Huang_f_28_ageing_GWAS %>% 
      filter(P < 1e-05),
    
    Huang_m_18_ageing_GWAS %>% 
      filter(P < 1e-05),
    
    Huang_m_25_ageing_GWAS %>% 
      filter(P < 1e-05),
    
    Huang_m_28_ageing_GWAS %>% 
      filter(P < 1e-05),
    
    Wilson_f_ageing_1_GWAS %>% 
      filter(P < 1e-05),
    
    Wilson_f_ageing_2_GWAS %>% 
      filter(P < 1e-05),
    
    Durham_f_ageing_GWAS %>% 
      filter(P < 1e-05),
    
    Patel_f_ageing_GWAS %>% 
      filter(P < 1e-05)
  ) %>% 
  distinct(SNP) %>% 
  left_join(Genomic_regions %>% mutate(Pruned_variant = "YES")) 

ageing_table %>% 
  add_row(Study = "Totals",
          Sex = "",
          Temperature = "",
          `p < 1e-05 variants` = nrow(ageing_p_05_SNPs),
          `p < 1e-05 genomic regions` = nrow(ageing_p_05_SNPs %>% filter(Pruned_variant == "YES")),
          `p < 1e-08 variants` = sum(ageing_table$`p < 1e-08 variants`),
          `p < 1e-08 genomic regions` = sum(ageing_table$`p < 1e-08 genomic regions`)) %>% 
  kable() %>% 
  kable_styling()

```

**Table SX**. Genotype to phenotype associations detected by univariate GWAS, for **baseline mortality rate**. The number of genomic regions indicates the number of genetic variants associated with baseline mortality after LD pruning. The total row shows the number of unique candidate variants identified across all studies. \*Wilson et al. phenotyped lifespan under two separate dietary conditions, which we include separately in our analysis.

```{r}
# filter down to sig associations
scaling_table <-
  bind_rows(
    tibble(`p < 1e-05 variants` = nrow(Arya_f_baseline_mortality_GWAS %>% filter(P < 1e-05)),
           `p < 1e-05 genomic regions` = nrow(inner_join(Genomic_regions, 
                                                            Arya_f_baseline_mortality_GWAS %>% filter(P < 1e-05))),
           `p < 1e-08 variants` = nrow(filter(Arya_f_baseline_mortality_GWAS, P < 1e-08)),
           `p < 1e-08 genomic regions` = nrow(inner_join(Genomic_regions, 
                                                            Arya_f_baseline_mortality_GWAS %>% filter(P < 1e-08)))) %>%
      mutate(Study = "Arya et al 2010",
             Treatment = "1",
             Sex = "Female",
             Temperature = "25",
             `Mating status` = "Virgin") %>% 
      dplyr::select(Study, Sex, Temperature,
                    `p < 1e-05 variants`, `p < 1e-05 genomic regions`, 
                    `p < 1e-08 variants`, `p < 1e-08 genomic regions`),
    
    
    tibble(`p < 1e-05 variants` = nrow(Huang_f_18_baseline_mortality_GWAS %>% filter(P < 1e-05)),
           `p < 1e-05 genomic regions` = nrow(inner_join(Genomic_regions, 
                                                            Huang_f_18_baseline_mortality_GWAS %>% filter(P < 1e-05))),
           `p < 1e-08 variants` = nrow(filter(Huang_f_18_baseline_mortality_GWAS, P < 1e-08)),
           `p < 1e-08 genomic regions` = nrow(inner_join(Genomic_regions, 
                                                            Huang_f_18_baseline_mortality_GWAS %>% filter(P < 1e-08)))) %>% 
      mutate(Study = "Huang et al 2020",
             Treatment = "1",
             Sex = "Female",
             Temperature = "18",
             `Mating status` = "Mated") %>% 
      dplyr::select(Study, Sex, Temperature,
                    `p < 1e-05 variants`, `p < 1e-05 genomic regions`, 
                    `p < 1e-08 variants`, `p < 1e-08 genomic regions`),
    
    
    tibble(`p < 1e-05 variants` = nrow(Huang_f_25_baseline_mortality_GWAS %>% filter(P < 1e-05)),
           `p < 1e-05 genomic regions` = nrow(inner_join(Genomic_regions, 
                                                            Huang_f_25_baseline_mortality_GWAS %>% filter(P < 1e-05))),
           `p < 1e-08 variants` = nrow(filter(Huang_f_25_baseline_mortality_GWAS, P < 1e-08)),
           `p < 1e-08 genomic regions` = nrow(inner_join(Genomic_regions, 
                                                            Huang_f_25_baseline_mortality_GWAS %>% filter(P < 1e-08)))) %>%
      mutate(Study = "Huang et al 2020",
             Treatment = "1",
             Sex = "Female",
             Temperature = "25",
             `Mating status` = "Mated") %>% 
      dplyr::select(Study, Sex, Temperature,
                    `p < 1e-05 variants`, `p < 1e-05 genomic regions`, 
                    `p < 1e-08 variants`, `p < 1e-08 genomic regions`),
    
    tibble(`p < 1e-05 variants` = nrow(Huang_f_28_baseline_mortality_GWAS %>% filter(P < 1e-05)),
           `p < 1e-05 genomic regions` = nrow(inner_join(Genomic_regions, 
                                                            Huang_f_28_baseline_mortality_GWAS %>% filter(P < 1e-05))),
           `p < 1e-08 variants` = nrow(filter(Huang_f_28_baseline_mortality_GWAS, P < 1e-08)),
           `p < 1e-08 genomic regions` = nrow(inner_join(Genomic_regions, 
                                                            Huang_f_28_baseline_mortality_GWAS %>% filter(P < 1e-08)))) %>%
      mutate(Study = "Huang et al 2020",
             Treatment = "1",
             Sex = "Female",
             Temperature = "28",
             `Mating status` = "Mated") %>% 
      dplyr::select(Study, Sex, Temperature,
                    `p < 1e-05 variants`, `p < 1e-05 genomic regions`, 
                    `p < 1e-08 variants`, `p < 1e-08 genomic regions`),
    
    tibble(`p < 1e-05 variants` = nrow(Wilson_f_baseline_mortality_1_GWAS %>% filter(P < 1e-05)),
           `p < 1e-05 genomic regions` = nrow(inner_join(Genomic_regions, 
                                                            Wilson_f_baseline_mortality_1_GWAS %>% filter(P < 1e-05))),
           `p < 1e-08 variants` = nrow(filter(Wilson_f_baseline_mortality_1_GWAS, P < 1e-08)),
           `p < 1e-08 genomic regions` = nrow(inner_join(Genomic_regions, 
                                                            Wilson_f_baseline_mortality_1_GWAS %>% filter(P < 1e-08)))) %>%
      mutate(Study = "Wilson et al 2020",
             Treatment = "1",
             Sex = "Female",
             Temperature = "25",
             `Mating status` = "Mated") %>% 
      dplyr::select(Study, Sex, Temperature,
                    `p < 1e-05 variants`, `p < 1e-05 genomic regions`, 
                    `p < 1e-08 variants`, `p < 1e-08 genomic regions`),
    
    tibble(`p < 1e-05 variants` = nrow(Wilson_f_baseline_mortality_2_GWAS %>% filter(P < 1e-05)),
           `p < 1e-05 genomic regions` = nrow(inner_join(Genomic_regions, 
                                                            Wilson_f_baseline_mortality_2_GWAS %>% filter(P < 1e-05))),
           `p < 1e-08 variants` = nrow(filter(Wilson_f_baseline_mortality_2_GWAS, P < 1e-08)),
           `p < 1e-08 genomic regions` = nrow(inner_join(Genomic_regions, 
                                                            Wilson_f_baseline_mortality_2_GWAS %>% filter(P < 1e-08)))) %>% 
      mutate(Study = "Wilson et al 2020*",
             Treatment = "2",
             Sex = "Female",
             Temperature = "25",
             `Mating status` = "Mated") %>% 
      dplyr::select(Study, Sex, Temperature,
                    `p < 1e-05 variants`, `p < 1e-05 genomic regions`, 
                    `p < 1e-08 variants`, `p < 1e-08 genomic regions`),
    
    tibble(`p < 1e-05 variants` = nrow(Durham_f_baseline_mortality_GWAS %>% filter(P < 1e-05)),
           `p < 1e-05 genomic regions` = nrow(inner_join(Genomic_regions, 
                                                            Durham_f_baseline_mortality_GWAS %>% filter(P < 1e-05))),
           `p < 1e-08 variants` = nrow(filter(Durham_f_baseline_mortality_GWAS, P < 1e-08)),
           `p < 1e-08 genomic regions` = nrow(inner_join(Genomic_regions, 
                                                            Durham_f_baseline_mortality_GWAS %>% filter(P < 1e-08)))) %>% 
      mutate(Study = "Durham et al 2014",
             Treatment = "1",
             Sex = "Female",
             Temperature = "25",
             `Mating status` = "Mated") %>% 
      dplyr::select(Study, Sex, Temperature,
                    `p < 1e-05 variants`, `p < 1e-05 genomic regions`, 
                    `p < 1e-08 variants`, `p < 1e-08 genomic regions`),
    
    
    tibble(`p < 1e-05 variants` = nrow(Patel_f_baseline_mortality_GWAS %>% filter(P < 1e-05)),
           `p < 1e-05 genomic regions` = nrow(inner_join(Genomic_regions, 
                                                            Patel_f_baseline_mortality_GWAS %>% filter(P < 1e-05))),
           `p < 1e-08 variants` = nrow(filter(Patel_f_baseline_mortality_GWAS, P < 1e-08)),
           `p < 1e-08 genomic regions` = nrow(inner_join(Genomic_regions, 
                                                            Patel_f_baseline_mortality_GWAS %>% filter(P < 1e-08)))) %>%
      mutate(Study = "Patel et al 2021",
             Treatment = "1",
             Sex = "Female",
             Temperature = "23",
             `Mating status` = "Mated") %>% 
      dplyr::select(Study, Sex, Temperature,
                    `p < 1e-05 variants`, `p < 1e-05 genomic regions`, 
                    `p < 1e-08 variants`, `p < 1e-08 genomic regions`),
    
    
    tibble(`p < 1e-05 variants` = nrow(Arya_m_baseline_mortality_GWAS %>% filter(P < 1e-05)),
           `p < 1e-05 genomic regions` = nrow(inner_join(Genomic_regions, 
                                                            Arya_m_baseline_mortality_GWAS %>% filter(P < 1e-05))),
           `p < 1e-08 variants` = nrow(filter(Arya_m_baseline_mortality_GWAS, P < 1e-08)),
           `p < 1e-08 genomic regions` = nrow(inner_join(Genomic_regions, 
                                                            Arya_m_baseline_mortality_GWAS %>% filter(P < 1e-08)))) %>%
      mutate(Study = "Arya et al 2010",
             Treatment = "1",
             Sex = "Male",
             Temperature = "25",
             `Mating status` = "Virgin") %>% 
      dplyr::select(Study, Sex, Temperature,
                    `p < 1e-05 variants`, `p < 1e-05 genomic regions`, 
                    `p < 1e-08 variants`, `p < 1e-08 genomic regions`),
    
    tibble(`p < 1e-05 variants` = nrow(Huang_m_18_baseline_mortality_GWAS %>% filter(P < 1e-05)),
           `p < 1e-05 genomic regions` = nrow(inner_join(Genomic_regions, 
                                                            Huang_m_18_baseline_mortality_GWAS %>% filter(P < 1e-05))),
           `p < 1e-08 variants` = nrow(filter(Huang_m_18_baseline_mortality_GWAS, P < 1e-08)),
           `p < 1e-08 genomic regions` = nrow(inner_join(Genomic_regions, 
                                                            Huang_m_18_baseline_mortality_GWAS %>% filter(P < 1e-08)))) %>%
      mutate(Study = "Huang et al 2020",
             Treatment = "1",
             Sex = "Male",
             Temperature = "18",
             `Mating status` = "Mated") %>% 
      dplyr::select(Study, Sex, Temperature,
                    `p < 1e-05 variants`, `p < 1e-05 genomic regions`, 
                    `p < 1e-08 variants`, `p < 1e-08 genomic regions`),
    
    
    tibble(`p < 1e-05 variants` = nrow(Huang_m_25_baseline_mortality_GWAS %>% filter(P < 1e-05)),
           `p < 1e-05 genomic regions` = nrow(inner_join(Genomic_regions, 
                                                            Huang_m_25_baseline_mortality_GWAS %>% filter(P < 1e-05))),
           `p < 1e-08 variants` = nrow(filter(Huang_m_25_baseline_mortality_GWAS, P < 1e-08)),
           `p < 1e-08 genomic regions` = nrow(inner_join(Genomic_regions, 
                                                            Huang_m_25_baseline_mortality_GWAS %>% filter(P < 1e-08)))) %>%
      mutate(Study = "Huang et al 2020",
             Treatment = "1",
             Sex = "Male",
             Temperature = "25",
             `Mating status` = "Mated") %>% 
      dplyr::select(Study, Sex, Temperature,
                    `p < 1e-05 variants`, `p < 1e-05 genomic regions`, 
                    `p < 1e-08 variants`, `p < 1e-08 genomic regions`),
    
    tibble(`p < 1e-05 variants` = nrow(Huang_m_28_baseline_mortality_GWAS %>% filter(P < 1e-05)),
           `p < 1e-05 genomic regions` = nrow(inner_join(Genomic_regions, 
                                                            Huang_m_28_baseline_mortality_GWAS %>% filter(P < 1e-05))),
           `p < 1e-08 variants` = nrow(filter(Huang_m_28_baseline_mortality_GWAS, P < 1e-08)),
           `p < 1e-08 genomic regions` = nrow(inner_join(Genomic_regions, 
                                                            Huang_m_28_baseline_mortality_GWAS %>% filter(P < 1e-08)))) %>%
      mutate(Study = "Huang et al 2020",
             Treatment = "1",
             Sex = "Male",
             Temperature = "28",
             `Mating status` = "Mated") %>% 
      dplyr::select(Study, Sex, Temperature,
                    `p < 1e-05 variants`, `p < 1e-05 genomic regions`, 
                    `p < 1e-08 variants`, `p < 1e-08 genomic regions`),
  ) 

# how many unique variants have been detected?
scaling_p_05_SNPs <-
  bind_rows(
    
    Arya_f_baseline_mortality_GWAS %>% 
      filter(P < 1e-05),
    
    Arya_m_baseline_mortality_GWAS %>% 
      filter(P < 1e-05),
    
    Huang_f_18_baseline_mortality_GWAS %>% 
      filter(P < 1e-05),
    
    Huang_f_25_baseline_mortality_GWAS %>% 
      filter(P < 1e-05),
    
    Huang_f_28_baseline_mortality_GWAS %>% 
      filter(P < 1e-05),
    
    Huang_m_18_baseline_mortality_GWAS %>% 
      filter(P < 1e-05),
    
    Huang_m_25_baseline_mortality_GWAS %>% 
      filter(P < 1e-05),
    
    Huang_m_28_baseline_mortality_GWAS %>% 
      filter(P < 1e-05),
    
    Wilson_f_baseline_mortality_1_GWAS %>% 
      filter(P < 1e-05),
    
    Wilson_f_baseline_mortality_2_GWAS %>% 
      filter(P < 1e-05),
    
    Durham_f_baseline_mortality_GWAS %>% 
      filter(P < 1e-05),
    
    Patel_f_baseline_mortality_GWAS %>% 
      filter(P < 1e-05)
  ) %>% 
  distinct(SNP) %>% 
  left_join(Genomic_regions %>% mutate(Pruned_variant = "YES")) 

scaling_table %>% 
  add_row(Study = "Totals",
          Sex = "",
          Temperature = "",
          `p < 1e-05 variants` = nrow(scaling_p_05_SNPs),
          `p < 1e-05 genomic regions` = nrow(scaling_p_05_SNPs %>% filter(Pruned_variant == "YES")),
          `p < 1e-08 variants` = sum(scaling_table$`p < 1e-08 variants`),
          `p < 1e-08 genomic regions` = sum(scaling_table$`p < 1e-08 genomic regions`)) %>% 
  kable() %>% 
  kable_styling()
```

## Applying cross-phenotype meta-analysis

### Generate the genetic correlation matrix

Using SNP effect sizes, we calculate the genetic correlations between a) rates of ageing and b) baseline mortality, measured in different environmental contexts.

```{r}
# use the BETA coefficients to build the SNP correlation matrix for the rate of ageing

SNP_ageing_axis_data <-
  bind_rows(
    Arya_f_ageing_GWAS %>% 
      mutate(Study = "Arya_2010", Temperature = 25, Sex = "Female"),
    
    Arya_m_ageing_GWAS %>% 
      mutate(Study = "Arya_2010", Temperature = 25, Sex = "Male"),
    
    Huang_f_18_ageing_GWAS %>% 
      mutate(Study = "Huang_2020", Temperature = 18, Sex = "Female"),
    
    Huang_m_18_ageing_GWAS %>% 
      mutate(Study = "Huang_2020", Temperature = 18, Sex = "Male"),
    
    Huang_f_25_ageing_GWAS %>% 
      mutate(Study = "Huang_2020", Temperature = 25, Sex = "Female"),
    
    Huang_m_25_ageing_GWAS %>% 
      mutate(Study = "Huang_2020", Temperature = 25, Sex = "Male"),
    
    Huang_f_28_ageing_GWAS %>% 
      mutate(Study = "Huang_2020", Temperature = 28, Sex = "Female"),
    
    Huang_m_28_ageing_GWAS %>% 
      mutate(Study = "Huang_2020", Temperature = 28, Sex = "Male"),
    
    Wilson_f_ageing_1_GWAS %>% 
      mutate(Study = "Wilson_2020_1", Temperature = 25, Sex = "Female"),
    
    Wilson_f_ageing_2_GWAS %>% 
      mutate(Study = "Wilson_2020_2", Temperature = 25, Sex = "Female"),
    
    Durham_f_ageing_GWAS %>% 
      mutate(Study = "Durham_2014", Temperature = 25, Sex = "Female"),
    
    Patel_f_ageing_GWAS %>% 
      mutate(Study = "Patel_2021", Temperature = 23, Sex = "Female")) %>% 
  dplyr::select(SNP, BETA, Study, Temperature, Sex) %>% 
  pivot_wider(values_from = BETA, names_from = c(Study, Temperature, Sex)) 

SNP_ageing_axis_LD_pruned <- SNP_ageing_axis_data %>% inner_join(Genomic_regions)

SNP_ageing_axis_corr_matrix <- cor(SNP_ageing_axis_LD_pruned %>% dplyr::select(-SNP), use = "pairwise.complete.obs", method = "spearman")

# use the BETA coefficients to build the SNP correlation matrix for scaling

SNP_baseline_mortality_axis_data <-
 bind_rows(
    Arya_f_baseline_mortality_GWAS %>% 
      mutate(Study = "Arya_2010", Temperature = 25, Sex = "Female"),
    
    Arya_m_baseline_mortality_GWAS %>% 
      mutate(Study = "Arya_2010", Temperature = 25, Sex = "Male"),
  
    Huang_f_18_baseline_mortality_GWAS %>% 
      mutate(Study = "Huang_2020", Temperature = 18, Sex = "Female"),
    
    Huang_m_18_baseline_mortality_GWAS %>% 
      mutate(Study = "Huang_2020", Temperature = 18, Sex = "Male"),
    
    Huang_f_25_baseline_mortality_GWAS %>% 
      mutate(Study = "Huang_2020", Temperature = 25, Sex = "Female"),
    
    Huang_m_25_baseline_mortality_GWAS %>% 
      mutate(Study = "Huang_2020", Temperature = 25, Sex = "Male"),
  
    Huang_f_28_baseline_mortality_GWAS %>% 
      mutate(Study = "Huang_2020", Temperature = 28, Sex = "Female"),
    
    Huang_m_28_baseline_mortality_GWAS %>% 
      mutate(Study = "Huang_2020", Temperature = 28, Sex = "Male"),
    
     Wilson_f_baseline_mortality_1_GWAS %>% 
      mutate(Study = "Wilson_2020_1", Temperature = 25, Sex = "Female"),
    
    Wilson_f_baseline_mortality_2_GWAS %>% 
      mutate(Study = "Wilson_2020_2", Temperature = 25, Sex = "Female"),
    
    Durham_f_baseline_mortality_GWAS %>% 
      mutate(Study = "Durham", Temperature = 25, Sex = "Female"),
  
    Patel_f_baseline_mortality_GWAS %>% 
      mutate(Study = "Patel", Temperature = 23, Sex = "Female")) %>% 
  dplyr::select(SNP, BETA, Study, Temperature, Sex) %>% 
  pivot_wider(values_from = BETA, names_from = c(Study, Temperature, Sex))

SNP_baseline_mortality_axis_LD_pruned <- SNP_baseline_mortality_axis_data %>% inner_join(Genomic_regions)


SNP_baseline_mortality_axis_corr_matrix <- cor(SNP_baseline_mortality_axis_LD_pruned %>% dplyr::select(-SNP), use = "pairwise.complete.obs", method = "spearman")

```

### Calculate meta-analytic test statistics

The purpose of these meta-analyses is to detect SNPs associated with 1) the rate of ageing and 2) baseline mortality rate.

**Run CPASSOC for the rate of ageing**

```{r}
# rate of ageing

ageing_axis_Arya_f_T <- 
  Arya_f_ageing_GWAS %>% 
  dplyr::select(SNP, T) %>% 
  rename(Arya_f = T)
    
ageing_axis_Arya_m_T <- 
  Arya_m_ageing_GWAS %>% 
  dplyr::select(SNP, T) %>% 
  rename(Arya_m = T)

ageing_axis_Huang_f_18_T <- 
  Huang_f_18_ageing_GWAS %>% 
  dplyr::select(SNP, T) %>% 
  rename(Huang_f_18 = T)
  
ageing_axis_Huang_m_18_T <- 
  Huang_m_18_ageing_GWAS %>% 
  dplyr::select(SNP, T) %>% 
  rename(Huang_m_18 = T)

ageing_axis_Huang_f_25_T <- 
  Huang_f_25_ageing_GWAS %>% 
  dplyr::select(SNP, T) %>% 
  rename(Huang_f_25 = T)
  
ageing_axis_Huang_m_25_T <- 
  Huang_m_25_ageing_GWAS %>% 
  dplyr::select(SNP, T) %>% 
  rename(Huang_m_25 = T)

ageing_axis_Huang_f_28_T <- 
  Huang_f_28_ageing_GWAS %>% 
  dplyr::select(SNP, T) %>% 
  rename(Huang_f_28 = T)
  
ageing_axis_Huang_m_28_T <- 
  Huang_m_28_ageing_GWAS %>% 
  dplyr::select(SNP, T) %>% 
  rename(Huang_m_28 = T)
    
ageing_axis_Wilson_f_1_T <- 
  Wilson_f_ageing_1_GWAS %>% 
  dplyr::select(SNP, T) %>% 
  rename(Wilson_f_1 = T)

ageing_axis_Wilson_f_2_T <- 
  Wilson_f_ageing_2_GWAS %>% 
  dplyr::select(SNP, T) %>% 
  rename(Wilson_f_2 = T)

ageing_axis_Durham_f_T <- 
  Durham_f_ageing_GWAS %>% 
  dplyr::select(SNP, T) %>% 
  rename(Durham_f = T)

ageing_axis_Patel_f_T <- 
  Patel_f_ageing_GWAS %>% 
  dplyr::select(SNP, T) %>% 
  rename(Patel_f = T)
    

ageing_axis_t_stats <-
  ageing_axis_Arya_f_T %>%
  inner_join(ageing_axis_Arya_m_T, by = "SNP") %>%
  inner_join(ageing_axis_Huang_f_18_T, by = "SNP") %>% 
  inner_join(ageing_axis_Huang_m_18_T, by = "SNP") %>% 
  inner_join(ageing_axis_Huang_f_25_T, by = "SNP") %>% 
  inner_join(ageing_axis_Huang_m_25_T, by = "SNP") %>% 
  inner_join(ageing_axis_Huang_f_28_T, by = "SNP") %>% 
  inner_join(ageing_axis_Huang_m_28_T, by = "SNP") %>% 
  inner_join(ageing_axis_Wilson_f_1_T, by = "SNP") %>%
  inner_join(ageing_axis_Wilson_f_2_T, by = "SNP") %>%
  inner_join(ageing_axis_Durham_f_T, by = "SNP") %>%
  inner_join(ageing_axis_Patel_f_T, by = "SNP") 

ageing_axis_t_stat_values <-
  ageing_axis_t_stats %>% 
  dplyr::select(2:13)

Sample_size_ageing_axis <- c(165, 165, 183, 183, 186, 186, 177, 177, 161, 161, 176, 193)

if(!file.exists("data/Derived/GWAS_results/ageing_axis_meta_results.csv")) {

# run the homogeneous effect meta-analysis

S_hom <- SHom(ageing_axis_t_stat_values, Sample_size_ageing_axis, SNP_ageing_axis_corr_matrix)

# calculate meta-p-values and bind the two together with the SNP names

p_S_hom <- pchisq(S_hom, df = 1, ncp = 0, lower.tail = F) %>% 
  as_tibble() %>% 
  bind_cols(S_hom) %>% 
  rename(meta_p_hom = value, 
         S_hom = ...2)

# Calculate S_het, an extension of S_hom that improves power when the genetic effect sizes vary for different traits e.g. if a SNP has a sex or environment opposite effect on lifespan

# estimate parameters of gamma distribution

para <- EstimateGamma(N = 1E4, Sample_size_ageing_axis, SNP_ageing_axis_corr_matrix);

S_het <- SHet(ageing_axis_t_stat_values, Sample_size_ageing_axis, SNP_ageing_axis_corr_matrix)

# obtain P-values of S_Het using the estimated gamma parameters
  
p_S_het <- pgamma(q = S_het-para[3], shape = para[1], scale = para[2], lower.tail = F) %>% 
  as_tibble() %>% 
  bind_cols(S_het) %>% 
  rename(meta_p_het = value, 
         S_het = ...2)


ageing_axis_meta_results <- 
  ageing_axis_t_stats %>% 
  bind_cols(p_S_hom,
            p_S_het) # add the unadjusted p values

write_csv(ageing_axis_meta_results, "data/Derived/GWAS_results/ageing_axis_meta_results.csv")

} else ageing_axis_meta_results <- read_csv("data/Derived/GWAS_results/ageing_axis_meta_results.csv")

```

**Run CPASSOC for the baseline rate of mortality**

```{r}
baseline_mortality_axis_Arya_f_T <- 
  Arya_f_baseline_mortality_GWAS %>% 
  dplyr::select(SNP, T) %>% 
  rename(Arya_f = T)
    
baseline_mortality_axis_Arya_m_T <- 
  Arya_m_baseline_mortality_GWAS %>% 
  dplyr::select(SNP, T) %>% 
  rename(Arya_m = T)

baseline_mortality_axis_Huang_f_18_T <- 
  Huang_f_18_baseline_mortality_GWAS %>% 
  dplyr::select(SNP, T) %>% 
  rename(Huang_f_18 = T)
  
baseline_mortality_axis_Huang_m_18_T <- 
  Huang_m_18_baseline_mortality_GWAS %>% 
  dplyr::select(SNP, T) %>% 
  rename(Huang_m_18 = T)

baseline_mortality_axis_Huang_f_25_T <- 
  Huang_f_25_baseline_mortality_GWAS %>% 
  dplyr::select(SNP, T) %>% 
  rename(Huang_f_25 = T)
  
baseline_mortality_axis_Huang_m_25_T <- 
  Huang_m_25_baseline_mortality_GWAS %>% 
  dplyr::select(SNP, T) %>% 
  rename(Huang_m_25 = T)

baseline_mortality_axis_Huang_f_28_T <- 
  Huang_f_28_baseline_mortality_GWAS %>% 
  dplyr::select(SNP, T) %>% 
  rename(Huang_f_28 = T)
  
baseline_mortality_axis_Huang_m_28_T <- 
  Huang_m_28_baseline_mortality_GWAS %>% 
  dplyr::select(SNP, T) %>% 
  rename(Huang_m_28 = T)
    
baseline_mortality_axis_Wilson_f_1_T <- 
  Wilson_f_baseline_mortality_1_GWAS %>% 
  dplyr::select(SNP, T) %>% 
  rename(Wilson_f_1 = T)

baseline_mortality_axis_Wilson_f_2_T <- 
  Wilson_f_baseline_mortality_2_GWAS %>% 
  dplyr::select(SNP, T) %>% 
  rename(Wilson_f_2 = T)

baseline_mortality_axis_Durham_f_T <- 
  Durham_f_baseline_mortality_GWAS %>% 
  dplyr::select(SNP, T) %>% 
  rename(Durham_f = T)

baseline_mortality_axis_Patel_f_T <- 
  Patel_f_baseline_mortality_GWAS %>% 
  dplyr::select(SNP, T) %>% 
  rename(Patel_f = T)
    

baseline_mortality_axis_t_stats <-
  baseline_mortality_axis_Arya_f_T %>%
  inner_join(baseline_mortality_axis_Arya_m_T, by = "SNP") %>%
  inner_join(baseline_mortality_axis_Huang_f_18_T, by = "SNP") %>% 
  inner_join(baseline_mortality_axis_Huang_m_18_T, by = "SNP") %>% 
  inner_join(baseline_mortality_axis_Huang_f_25_T, by = "SNP") %>% 
  inner_join(baseline_mortality_axis_Huang_m_25_T, by = "SNP") %>% 
  inner_join(baseline_mortality_axis_Huang_f_28_T, by = "SNP") %>% 
  inner_join(baseline_mortality_axis_Huang_m_28_T, by = "SNP") %>% 
  inner_join(baseline_mortality_axis_Wilson_f_1_T, by = "SNP") %>%
  inner_join(baseline_mortality_axis_Wilson_f_2_T, by = "SNP") %>%
  inner_join(baseline_mortality_axis_Durham_f_T, by = "SNP") %>%
  inner_join(baseline_mortality_axis_Patel_f_T, by = "SNP") 


baseline_mortality_axis_t_stat_values <-
  baseline_mortality_axis_t_stats %>% 
  dplyr::select(2:13)

Sample_size_baseline_mortality_axis <- c(165, 165, 183, 183, 186, 186, 177, 177, 161, 161, 176, 193)

if(!file.exists("data/Derived/GWAS_results/baseline_mortality_axis_meta_results.csv")) {

# run the homogeneous effect meta-analysis

S_hom <- SHom(baseline_mortality_axis_t_stat_values, Sample_size_baseline_mortality_axis, SNP_baseline_mortality_axis_corr_matrix)

# calculate meta-p-values and bind the two together with the SNP names

p_S_hom <- pchisq(S_hom, df = 1, ncp = 0, lower.tail = F) %>% 
  as_tibble() %>% 
  bind_cols(S_hom) %>% 
  rename(meta_p_hom = value, 
         S_hom = ...2)

# Calculate S_het, an extension of S_hom that improves power when the genetic effect sizes vary for different traits (e.g. if a SNP has a sex or enviornment opposite effect on lifespan)

# estimate parameters of gamma distribution

para <- EstimateGamma(N = 1E4, Sample_size_baseline_mortality_axis, SNP_baseline_mortality_axis_corr_matrix);

S_het <- SHet(baseline_mortality_axis_t_stat_values, Sample_size_baseline_mortality_axis, SNP_baseline_mortality_axis_corr_matrix)

# obtain P-values of S_Het using the estimated gamma parameters
  
p_S_het <- pgamma(q = S_het-para[3], shape = para[1], scale = para[2], lower.tail = F) %>% 
  as_tibble() %>% 
  bind_cols(S_het) %>% 
  rename(meta_p_het = value, 
         S_het = ...2)


baseline_mortality_axis_meta_results <- 
  baseline_mortality_axis_t_stats %>% 
  bind_cols(p_S_hom,
            p_S_het) # add the unadjusted p values

write_csv(baseline_mortality_axis_meta_results, "data/Derived/GWAS_results/baseline_mortality_axis_meta_results.csv")

} else baseline_mortality_axis_meta_results <- read_csv("data/Derived/GWAS_results/baseline_mortality_axis_meta_results.csv")

```

## Visualise the results

We combine GWAS $T$ statistics calculated for the rate of ageing and baseline mortality measured across different contexts. It's possible that some SNPs have G x E interactions that lead to a heterogeneous effect across phenotypes. We therefore utilise the `S_het` calculated p-values.

First lets show the effect of `CPASSOC` on the number of variants found to be associated with the rate of ageing and the scaling of mortality risk.

**Table SX**. the number of variants associated with ageing rate and baseline mortality at various significance thresholds, estimated by univariate GWAS or CPASSOC. The number of genomic regions indicates the number of variants detected after LD pruning.

```{r}
tibble(Analysis = c("CPASSOC", "Univariate", "CPASSOC", "Univariate"),
       Trait = c("Ageing rate", "Ageing rate", "Scaling", "Scaling"),
       `p < 1e-05 variants` = c(sum(ageing_axis_meta_results$meta_p_het < 1e-05),
                                nrow(ageing_p_05_SNPs),
                                sum(baseline_mortality_axis_meta_results$meta_p_het < 1e-05),
                                nrow(scaling_p_05_SNPs)),
       `p < 1e-05 genomic regions` = c(nrow(ageing_axis_meta_results %>%
                                                 filter(meta_p_het < 1e-05) %>%
                                                 inner_join(Genomic_regions)),
                                          nrow(ageing_p_05_SNPs %>% filter(Pruned_variant == "YES")),
                                          nrow(baseline_mortality_axis_meta_results %>%
                                                 filter(meta_p_het < 1e-05) %>%
                                                 inner_join(Genomic_regions)),
                                          nrow(scaling_p_05_SNPs %>% filter(Pruned_variant == "YES"))),
       `p < 1e-08 variants` = c(sum(ageing_axis_meta_results$meta_p_het < 1e-08),
                                sum(ageing_table$`p < 1e-08 variants`),
                                sum(baseline_mortality_axis_meta_results$meta_p_het < 1e-08),
                                sum(scaling_table$`p < 1e-08 variants`)),
       `p < 1e-08 genomic regions` = c(nrow(ageing_axis_meta_results %>% 
                                                 filter(meta_p_het < 1e-08) %>% 
                                                 inner_join(Genomic_regions)),
                                          sum(ageing_table$`p < 1e-08 genomic regions`),
                                          nrow(baseline_mortality_axis_meta_results %>%
                                                 filter(meta_p_het < 1e-08) %>%
                                                 inner_join(Genomic_regions)),
                                          sum(scaling_table$`p < 1e-08 genomic regions`)))  %>% 
  kable() %>% 
  kable_styling()

```

**Table SX**. genes that encompass or are very close to the genetic variants that have associations with the rate of ageing.

```{r}
# join gene annotations with the list of analysed variants 
# note that some SNPs are associated with >1 gene, because the gene annotations overlap (I think) or the variant is close to multiple annotated genes. Others are not near any known genes, producing NAs.

ageing_rate_genes <-
  ageing_axis_meta_results %>%
  filter(meta_p_het < 1e-08) %>% 
  dplyr::select(SNP, S_het, meta_p_het) %>%
  left_join(annotations %>% filter(distance.to.gene <= 500)) %>% 
  mutate(meta_p_het = signif(meta_p_het*10^9, 3)/10^9,
         S_het = round(S_het, 3)) %>% 
  dplyr::select(SNP, S_het, meta_p_het, FBID, gene_name, site.class, distance.to.gene)

ageing_rate_genes %>% 
  my_data_table()
```

**Table SX**. genes that encompass or are very close to the genetic variants that have associations with baseline mortality rate.

```{r}
scaling_genes <-
  baseline_mortality_axis_meta_results %>% 
  filter(meta_p_het < 1e-08) %>% 
  dplyr::select(SNP, S_het, meta_p_het) %>%
  left_join(annotations %>% filter(distance.to.gene <= 500)) %>% 
  mutate(meta_p_het = signif(meta_p_het*10^10, 3)/10^10,
         S_het = round(S_het, 3)) %>% 
  dplyr::select(SNP, S_het, meta_p_het, FBID, gene_name, site.class, distance.to.gene)

scaling_genes %>% 
  my_data_table()
```

Now lets build some 'Manhattan plots' to show where these significant associations can be found:

```{r, fig.width=11, eval=TRUE}
#| column: page

ageing_axis_results <- 
  ageing_axis_meta_results %>% 
  inner_join(Genomic_regions) %>% 
  dplyr::select(SNP, meta_p_hom, meta_p_het) %>% 
  rename(P = meta_p_het) %>% 
  mutate(logp = -log10(P))

baseline_mortality_axis_results <- 
  baseline_mortality_axis_meta_results %>% 
  inner_join(Genomic_regions) %>% 
  dplyr::select(SNP, meta_p_hom, meta_p_het) %>% 
  rename(P = meta_p_het) %>% 
  mutate(logp = -log10(P))

# plot the results using the manhattan plot custom function we defined earlier

ageing_axis_het_plot <- 
  build_manhattan_plot(ageing_axis_results) +
  labs(title = "Ageing rate") +
  theme(plot.title = element_text(size = 20, hjust = 0.5)) +
  scale_y_continuous(limits = c(0, 19), expand = c(0, 0))

baseline_mortality_axis_het_plot <- 
  build_manhattan_plot(baseline_mortality_axis_results) +
  labs(title = "Baseline mortality") +
  theme(plot.title = element_text(size = 20, hjust = 0.5)) +
  scale_y_continuous(limits = c(0, 19), expand = c(0, 0))

baseline_mortality_axis_het_plot + ageing_axis_het_plot  
```

**Figure XX**. Manhattan plots showing the -Log~10~ *p*-value for each locus' effect on baseline mortality and the rate of ageing.

Plot the univariate effect sizes for each of the genomic regions associated with the rate of ageing at the genome-wide significance threshold (p \< $10^{-8}$) after CPASSOC.

```{r, fig.height=9}

SNP_heatmap_ageing_axis <-
  SNP_ageing_axis_data %>% 
  inner_join(
    ageing_axis_meta_results %>% 
      filter(meta_p_het < 1e-08) %>% 
      dplyr::select(SNP) %>% 
      inner_join(Genomic_regions))

row_name <- SNP_heatmap_ageing_axis$SNP
SNP_heatmap_ageing_axis <- SNP_heatmap_ageing_axis %>% dplyr::select(-SNP) %>% as.matrix()
rownames(SNP_heatmap_ageing_axis) <- row_name

breaksList <- seq(-0.1, 0.1, by = 0.001)

annotation_SNPs <- 
  ageing_axis_meta_results %>% filter(meta_p_het < 1e-08) %>% dplyr::select(SNP) %>% 
  mutate(Chromosome = case_when(str_detect(SNP, "2L") ~ "2L",
                                str_detect(SNP, "2R") ~ "2R",
                                str_detect(SNP, "3L") ~ "3L",
                                str_detect(SNP, "3R") ~ "3R",
                                str_detect(SNP, "X") ~ "X"))

annotation_studies <- 
  tibble(Study = c("Arya_2010_f_25",
                   "Huang_2020_f_18",
                   "Huang_2020_f_25",
                   "Huang_2020_f_28",
                   "Wilson_2020_f_25_1",
                   "Wilson_2020_f_25_2",
                   "Durham_2014_f_25",
                   "Patel_2021_f_23",
                   "Arya_2010_m_25",
                   "Huang_2020_m_18",
                   "Huang_2020_m_25",
                   "Huang_2020_m_28"),
         Temperature = c("25",
                         "18",
                         "25",
                         "28",
                         "25",
                         "25",
                         "25",
                         "23",
                         "25",
                         "18",
                         "25",
                         "28")) %>% 
  mutate(Sex = case_when(str_detect(Study, "_f") ~ "Female",
                         .default = "Male"),
         Mating = case_when(str_detect(Study, "Arya") ~ "NO",
                             str_detect(Study, "Huang") ~ "Throughout life",
                             str_detect(Study, "Wilson") ~ "Early life",
                             str_detect(Study, "Durham") ~ "Throughout life",
                             str_detect(Study, "Patel") ~ "Early life"),
         Author = str_extract(Study, ".*(?=\\_)"),
         Author = str_remove(Author, "_f"),
         Author = str_remove(Author, "_m"))


# create a study annotation column, need this to be a data.frame rather than a tibble for some reason 

Study_details <- annotation_studies %>%
  as.data.frame() %>% 
  dplyr::select(Study, Temperature, Mating)

my_categories <- data.frame(row.names = Study_details[, 1], Temperature = Study_details[, 2],
                            Mating = Study_details[, 3])

my_colors <- list(Temperature = c("18" = "#7bbcd5", # sailboat colours from pnw
                                  "23" = "#d0e2af",
                                  "25" = "#f5db99",
                                  "28" = "#e89c81"),
                  Mating = c("NO" = "#f8e3d1", # Shuksan from pnw
                             "Early life" = "#d7b1c5",
                             "Throughout life" = "#ac8eab"),
                  Chromosome = c("2L" = "#d8aedd", # lake colours from pnw
                                 "2R" = "#cb74ad",
                                 "3L" = "#11c2b5",
                                 "3R" = "#72e1e1",
                                 "X" = "#fbcc74"))
# create a SNP annotation column

SNP_details <- annotation_SNPs %>%
  as.data.frame()

my_SNP_categories <- data.frame(row.names = SNP_details[, 1], Chromosome = SNP_details[, 2])

my_col_names <- c("Arya et al females", "Huang et al females", "Huang et al females",
                  "Huang et al females", "Wilson et al females 1", "Wilson et al females 2", "Durham et al females",
                  "Patel et al females", "Arya et al males", "Huang et al males", "Huang et al males",
                  "Huang et al males")

  pheatmap(SNP_heatmap_ageing_axis, breaks = breaksList, 
         main = "",
         color = colorRampPalette(rev(met.brewer("Benedictus", direction = 1)))(length(breaksList)),
         legend = TRUE, cutree_rows = 6, cutree_cols = 5, 
         angle_col = 45, border_color = "white",
         annotation_col = my_categories, annotation_colors = my_colors, annotation_row = my_SNP_categories,
         fontsize = 8, labels_col = my_col_names)

```

**Figure XX**. univariate effect sizes for each of the genomic regions associated with ageing rate at the genome-wide significance threshold (p \< $10^{-8}$) after CPASSOC. Studies are clustered by similiarity in genetic effects on the X axis, while genomic regions are clustered by similarity in effect size across studies on the Y axis. Positive effect sizes indicate that the minor allele increases ageing rate in the conditions the study was performed in.

Plot the univariate effect sizes for each of the genomic regions associated with the scaling of mortality risk at the genome-wide significance threshold (p \< $0.05^{-8}$) after CPASSOC.

```{r, fig.height=9}
SNP_heatmap_baseline_mortality_axis <-
  SNP_baseline_mortality_axis_data %>% 
  inner_join(
    baseline_mortality_axis_meta_results %>% 
      filter(meta_p_het < 1e-08) %>% 
      dplyr::select(SNP) %>% 
      inner_join(Genomic_regions))

row_name <- SNP_heatmap_baseline_mortality_axis$SNP
SNP_heatmap_baseline_mortality_axis <- SNP_heatmap_baseline_mortality_axis %>% dplyr::select(-SNP) %>% as.matrix()
rownames(SNP_heatmap_baseline_mortality_axis) <- row_name

breaksList <- seq(-7, 7, by = 0.01)

annotation_SNPs <- 
  baseline_mortality_axis_meta_results %>% filter(meta_p_het < 1e-08) %>% dplyr::select(SNP) %>% 
  mutate(Chromosome = case_when(str_detect(SNP, "2L") ~ "2L",
                                str_detect(SNP, "2R") ~ "2R",
                                str_detect(SNP, "3L") ~ "3L",
                                str_detect(SNP, "3R") ~ "3R",
                                str_detect(SNP, "X") ~ "X"))

annotation_studies <- 
  tibble(Study = c("Arya_2010_f_25",
                   "Huang_2020_f_18",
                   "Huang_2020_f_25",
                   "Huang_2020_f_28",
                   "Wilson_2020_f_25_1",
                   "Wilson_2020_f_25_2",
                   "Durham_2014_f_25",
                   "Patel_2021_f_23",
                   "Arya_2010_m_25",
                   "Huang_2020_m_18",
                   "Huang_2020_m_25",
                   "Huang_2020_m_28"),
         Temperature = c("25",
                         "18",
                         "25",
                         "28",
                         "25",
                         "25",
                         "25",
                         "23",
                         "25",
                         "18",
                         "25",
                         "28")) %>% 
  mutate(Sex = case_when(str_detect(Study, "_f") ~ "Female",
                         .default = "Male"),
         Mating = case_when(str_detect(Study, "Arya") ~ "NO",
                             str_detect(Study, "Huang") ~ "Throughout life",
                             str_detect(Study, "Wilson") ~ "Early life",
                             str_detect(Study, "Durham") ~ "Throughout life",
                             str_detect(Study, "Patel") ~ "Early life"),
         Author = str_extract(Study, ".*(?=\\_)"),
         Author = str_remove(Author, "_f"),
         Author = str_remove(Author, "_m"))


# create a study annotation column, need this to be a data.frame rather than a tibble for some reason 

Study_details <- annotation_studies %>%
  as.data.frame() %>% 
  dplyr::select(Study, Temperature, Mating)

my_categories <- data.frame(row.names = Study_details[, 1], Temperature = Study_details[, 2],
                            Mating = Study_details[, 3])

my_colors <- list(Temperature = c("18" = "#7bbcd5", # sailboat colours from pnw
                                  "23" = "#d0e2af",
                                  "25" = "#f5db99",
                                  "28" = "#e89c81"),
                  Mating = c("NO" = "#f8e3d1", # Shuksan from pnw
                             "Early life" = "#d7b1c5",
                             "Throughout life" = "#ac8eab"),
                  Chromosome = c("2L" = "#d8aedd", # lake colours from pnw
                                 "2R" = "#cb74ad",
                                 "3L" = "#11c2b5",
                                 "3R" = "#72e1e1",
                                 "X" = "#fbcc74"))
# create a SNP annotation column

SNP_details <- annotation_SNPs %>%
  as.data.frame()

my_SNP_categories <- data.frame(row.names = SNP_details[, 1], Chromosome = SNP_details[, 2])

my_col_names <- c("Arya et al females", "Huang et al females", "Huang et al females",
                  "Huang et al females", "Wilson et al females 1", "Wilson et al females 2", "Durham et al females",
                  "Patel et al females", "Arya et al males", "Huang et al males", "Huang et al males",
                  "Huang et al males")


  pheatmap(SNP_heatmap_baseline_mortality_axis, breaks = breaksList, 
         main = "",
         color = colorRampPalette(rev(met.brewer("Benedictus", direction = 1)))(length(breaksList)),
         legend = TRUE, cutree_rows = 6, cutree_cols = 5, 
         angle_col = 45, border_color = "white",
         annotation_col = my_categories, annotation_colors = my_colors, 
         annotation_row = my_SNP_categories,
         fontsize = 8, labels_col = my_col_names)
```

**Figure XX**. univariate effect sizes for each of the SNPs associated with mortality scaling at the genome-wide significance threshold (p \< $0.05^{-8}$) after CPASSOC. Effect sizes are expressed in standard deviations from the mean life expectancy found in each study. Studies are clustered by similiarity in SNP effects on the X axis, while SNPs are clustered by similarity in effect size across studies on the Y axis. Positive effect sizes indicate that the minor allele increases life expectancy in the conditions the study was performed in.

# Are ageing and baseline mortality polygenic?

If traits are polygenic, the majority of the genetic variants that effect their expession will have effects that are too small to detect with GWA, unless sample sizes are truly gigantic. A promising alternative is to instead look to see if effects estimated in one study can be replicated in a second, independent study. To test this in our dataset, we selected one trait measurement from each study trait that phenotyped females, at 25C, with an opportunity for mating.

As a control, this is what happens if we bin and plot the relationship between two uncorrelated variables

```{r}
sim_data <-
  tibble(draw_1 = rnorm(220437, 0, 1),
         draw_2 = rnorm(220437, 0, 1)) %>%
  arrange(draw_1) %>%
  mutate(bin = c(rep(1:floor(n()/100), each = 100),
                 rep(floor(n()/100) + 1, each = n() %% 100))) %>%
  group_by(bin) %>%
  summarise(draw_1 = mean(draw_1), draw_2 = mean(draw_2))

(boyle_plot_sim <-
  sim_data %>%
  ggplot(aes(draw_1, draw_2)) +
  geom_hline(yintercept = 0, linetype = 2) +
  geom_vline(xintercept = 0, linetype = 2) +
  geom_point(alpha = 0.8, size = 2.2) +
  stat_smooth(method = "lm", formula = y ~ x + I(x^2), linewidth = 0.75) +
  coord_cartesian(xlim = c(-4, 4), ylim = c(-4, 4)) +
  xlab("Mean effect size \n (random draw 1)") +
  ylab("Mean effect size \n (random draw 2)") +
  theme_bw() +
  theme(strip.background = element_blank(),
        strip.text = element_text(hjust=0)) +
  theme(text = element_text(size = 14))
)

```

```{r}
ageing_boyle_data <-
  SNP_ageing_axis_LD_pruned %>% 
  dplyr::select(SNP, Huang_2020_25_Female, Wilson_2020_1_25_Female, Durham_2014_25_Female) %>% 
  filter_at(vars(2:4), all_vars(!is.na(.))) %>% # remove NAs
  arrange(Huang_2020_25_Female) %>%
  mutate(bin = c(rep(1:floor(n()/100), each = 100),
                 rep(floor(n()/100) + 1, each = n() %% 100))) %>%
  group_by(bin) %>%
  summarise(Huang_2020_25_Female = mean(Huang_2020_25_Female), 
            Wilson_2020_1_25_Female = mean(Wilson_2020_1_25_Female),
            Durham_2014_25_Female = mean(Durham_2014_25_Female))

ageing_boyle_data_2 <-
  SNP_ageing_axis_LD_pruned %>% 
  dplyr::select(SNP, Wilson_2020_1_25_Female, Durham_2014_25_Female) %>% 
  filter_at(vars(2:3), all_vars(!is.na(.))) %>% # remove NAs
  arrange(Wilson_2020_1_25_Female) %>%
  mutate(bin = c(rep(1:floor(n()/100), each = 100),
                 rep(floor(n()/100) + 1, each = n() %% 100))) %>%
  group_by(bin) %>%
  summarise(Wilson_2020_1_25_Female = mean(Wilson_2020_1_25_Female),
            Durham_2014_25_Female = mean(Durham_2014_25_Female))

baseline_mortality_boyle_data <-
  SNP_baseline_mortality_axis_LD_pruned %>% 
  dplyr::select(SNP, Huang_2020_25_Female, Wilson_2020_1_25_Female, Durham_25_Female) %>% 
  filter_at(vars(2:4), all_vars(!is.na(.))) %>% # remove NAs
  arrange(Huang_2020_25_Female) %>%
  mutate(bin = c(rep(1:floor(n()/100), each = 100),
                 rep(floor(n()/100) + 1, each = n() %% 100))) %>%
  group_by(bin) %>%
  summarise(Huang_2020_25_Female = mean(Huang_2020_25_Female), 
            Wilson_2020_1_25_Female = mean(Wilson_2020_1_25_Female),
            Durham_25_Female = mean(Durham_25_Female))

baseline_mortality_boyle_data_2 <-
  SNP_baseline_mortality_axis_LD_pruned %>% 
  dplyr::select(SNP, Wilson_2020_1_25_Female, Durham_25_Female) %>% 
  filter_at(vars(2:3), all_vars(!is.na(.))) %>% # remove NAs
  arrange(Wilson_2020_1_25_Female) %>%
  mutate(bin = c(rep(1:floor(n()/100), each = 100),
                 rep(floor(n()/100) + 1, each = n() %% 100))) %>%
  group_by(bin) %>%
  summarise(Wilson_2020_1_25_Female = mean(Wilson_2020_1_25_Female),
            Durham_25_Female = mean(Durham_25_Female))

boyle_plot_H_W <-
  ageing_boyle_data %>%
  ggplot(aes(Huang_2020_25_Female, Wilson_2020_1_25_Female)) +
  geom_hline(yintercept = 0, linetype = 2) +
  geom_vline(xintercept = 0, linetype = 2) +
  geom_point(alpha = 0.8, size = 2.2) +
  stat_smooth(method = "lm", formula = y ~ x + I(x^2), linewidth = 0.75) +
  coord_cartesian(xlim = c(-0.16, 0.16), ylim = c(-0.1, 0.1)) +
  xlab("Ageing SNP effect (Huang et al.)") +
  ylab("Ageing SNP effect (Wilson et al.)") +
  theme_bw() +
  theme(strip.background = element_blank(),
        strip.text = element_text(hjust=0)) +
  theme(text = element_text(size = 10))

boyle_plot_H_D <-
  ageing_boyle_data %>%
  ggplot(aes(Huang_2020_25_Female, Durham_2014_25_Female)) +
  geom_hline(yintercept = 0, linetype = 2) +
  geom_vline(xintercept = 0, linetype = 2) +
  geom_point(alpha = 0.8, size = 2.2) +
  stat_smooth(method = "lm", formula = y ~ x + I(x^2), linewidth = 0.75) +
  coord_cartesian(xlim = c(-0.16, 0.16), ylim = c(-0.1, 0.1)) +
  labs(x = "Ageing SNP effect (Huang et al.)",
       y = "Ageing SNP effect (Durham et al.)") +
  theme_bw() +
  theme(plot.title = element_text(hjust = 0.5),
        text = element_text(size = 10))

boyle_plot_W_D <-
  ageing_boyle_data_2 %>%
  ggplot(aes(Wilson_2020_1_25_Female, Durham_2014_25_Female)) +
  geom_hline(yintercept = 0, linetype = 2) +
  geom_vline(xintercept = 0, linetype = 2) +
  geom_point(alpha = 0.8, size = 2.2) +
  stat_smooth(method = "lm", formula = y ~ x + I(x^2), linewidth = 0.75) +
  coord_cartesian(xlim = c(-0.16, 0.16), ylim = c(-0.1, 0.1)) +
  xlab("Ageing SNP effect (Wilson et al.)") +
  ylab("Ageing SNP effect (Durham et al.)") +
  theme_bw() +
  theme(strip.background = element_blank(),
        strip.text = element_text(hjust=0)) +
  theme(text = element_text(size = 10))

boyle_baseline_plot_H_W <-
  baseline_mortality_boyle_data %>%
  ggplot(aes(Huang_2020_25_Female, Wilson_2020_1_25_Female)) +
  geom_hline(yintercept = 0, linetype = 2) +
  geom_vline(xintercept = 0, linetype = 2) +
  geom_point(alpha = 0.8, size = 2.2) +
  stat_smooth(method = "lm", formula = y ~ x + I(x^2), linewidth = 0.75) +
  coord_cartesian(xlim = c(-5, 5), ylim = c(-3.125, 3.125)) +
    labs(x = "Scaling SNP effect (Huang et al.)",
       y = "Scaling SNP effect (Wilson et al.)") +
  theme_bw() +
  theme(strip.background = element_blank(),
        strip.text = element_text(hjust=0)) +
  theme(text = element_text(size = 10))

boyle_baseline_plot_H_D <-
  baseline_mortality_boyle_data %>%
  ggplot(aes(Huang_2020_25_Female, Durham_25_Female)) +
  geom_hline(yintercept = 0, linetype = 2) +
  geom_vline(xintercept = 0, linetype = 2) +
  geom_point(alpha = 0.8, size = 2.2) +
  stat_smooth(method = "lm", formula = y ~ x + I(x^2), linewidth = 0.75) +
  coord_cartesian(xlim = c(-5, 5), ylim = c(-3.125, 3.125)) +
    labs(x = "Scaling SNP effect (Huang et al.)",
       y = "Scaling SNP effect (Durham et al.)") +
  theme_bw() +
  theme(plot.title = element_text(hjust = 0.5),
        text = element_text(size = 10))

boyle_baseline_plot_W_D <-
  baseline_mortality_boyle_data_2 %>%
  ggplot(aes(Wilson_2020_1_25_Female, Durham_25_Female)) +
  geom_hline(yintercept = 0, linetype = 2) +
  geom_vline(xintercept = 0, linetype = 2) +
  geom_point(alpha = 0.8, size = 2.2) +
  stat_smooth(method = "lm", formula = y ~ x + I(x^2), linewidth = 0.75) +
    coord_cartesian(xlim = c(-5, 5), ylim = c(-3.125, 3.125)) +
    labs(x = "Scaling SNP effect (Wilson et al.)",
       y = "Scaling SNP effect (Durham et al.)") +
  theme_bw() +
  theme(strip.background = element_blank(),
        strip.text = element_text(hjust=0)) +
  theme(text = element_text(size = 10))

(boyle_plot_H_W + boyle_plot_H_D + boyle_plot_W_D) /
  (boyle_baseline_plot_H_W + boyle_baseline_plot_H_D + boyle_baseline_plot_W_D) 

```

**Figure SX**. Each point represents the mean effect size for a group of 100 genomic regions, ordered by association with female ageing rate (top panels) or female baseline mortality risk (bottom panels), measured in the study named on the x-axis. While traits were measured different laboratories, conditions were similar in each treatment: females were housed at 25C, with an opportunity for mating. Effect sizes are expressed as trait standard deviations.

# Figure 4

```{r, fig.width=8.3, fig.height=6.97, eval=TRUE}

f4_a <- c + labs(title = NULL) + theme(legend.position="none")
f4_b <- e + labs(title = NULL) + theme(legend.position = "none")
f4_c <- g + labs(title = NULL) + theme(legend.position = "none")

part_1 <-
  (f4_a + f4_b + f4_c) +
  plot_layout(#guides = collect, 
              axis_titles = "collect")

f4_e <-
  boyle_plot_H_W  + 
  labs(x = "SNP effect (Huang et al.)",
       y = "SNP effect (Wilson et al.)") 

f4_f <-
  boyle_plot_H_D +
  labs(x = "SNP effect (Huang et al.)",
       y = "SNP effect (Durham et al.)") 

f4_g <-
  boyle_plot_W_D +
  labs(x = "SNP effect (Wilson et al.)",
       y = "SNP effect (Durham et al.)") 
  
part_3 <- (f4_e + f4_f + f4_g)
  

part_1 / (ageing_axis_het_plot + labs(title = NULL)) / part_3 + plot_annotation(tag_levels = "A")
```

**Figure 4.** detection of genetic variants associated with the rate of ageing. **A**-**C** demonstrate our ageing rate metric used for genome-wide association analysis. Dashed lines show simulations from the gompertz distribution: each line was generated with a different rate of ageing value and extends as the baseline mortality rate changes. Note that the slope from the regressions of lifespan equality on life expectancy align closely with these curves. Points show fly genotypes; deviations from the regression line therefore indicate that genotypes differ in the rate of ageing.